I have two sets of points in 2D A
and B
and I need to find the minimum distance for each point in A
, to a point in B
. So far I've been using SciPy's cdist with the code below
import numpy as np
from scipy.spatial.distance import cdist
def ABdist(A, B):
# Distance to all points in B, for each point in A.
dist = cdist(A, B, 'euclidean')
# Indexes to minimum distances.
min_dist_idx = np.argmin(dist, axis=1)
# Store only the minimum distances for each point in A, to a point in B.
min_dists = [dist[i][md_idx] for i, md_idx in enumerate(min_dist_idx)]
return min_dist_idx, min_dists
N = 10000
A = np.random.uniform(0., 5000., (N, 2))
B = np.random.uniform(0., 5000., (N, 2))
min_dist_idx, min_dists = ABdist(A, B)
which works just fine for smallish values of N
. But now the lengths of the sets have increased from N=10000
to N=35000
and I'm running into a
dm = np.zeros((mA, mB), dtype=np.double)
MemoryError
I know I can replace cdist
with a for loop that keeps only the minimum distance (and the index) for each point in A
to each point in B
, as that is all I need. I don't need the full AxB
distance matrix. But I've been using cdist
precisely because it is fast.
Is there a way to replace cdist
with an implementation that is (almost?) as fast, but that does not take that much memory?
cdist