8

The format of my dataset: [x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23.

My question now is how can I cluster this data when I need an euclidean distance metric for the coordinates, but a different one for the hours (since d(23,0) is 23 in the euclidean distance metric). Is it possible to cluster data with different distance metrics for each feature in scipy? How?

Thank you

8
  • 1
    What clustering technique do you want to use?
    – YXD
    Sep 11, 2013 at 10:15
  • Currently i'm experimenting with kmeans, but any clustering method that gives a good result is fine. Sep 11, 2013 at 10:20
  • Are you confident it would converge? The way I would do it would be to monkey patch the VQ function with my own modifications based on the dictionary for each iteration. I don't think it would be overly difficult to do that. Sep 11, 2013 at 11:24
  • It should converge if the distance for the different metrics is well chosen, currently I'm trying to rewrite part of the kmeans algorithm so it can handle different distance metrics for each feature. Since I'm pretty new to python however this might take a while. But I have a feeling this is the only solution. Sep 11, 2013 at 11:55
  • added a reply, than searched for what clustering was, n figured out that you dont really just want to calculate the distance between (x0,y0) and (x1,y1) on one side and time difference between (h0) and (h1) on the other side, but with one data structure - If thats what you want to do, i can undelete my reply though Sep 11, 2013 at 14:21

1 Answer 1

3

You'll need to define your own metric, which handles "time" in an appropriate way. In the docs for scipy.spatial.distance.pdist you can define your own function

Y = pdist(X, f)

Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. [...] For example, Euclidean distance between the vectors could be computed as follows:

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

The metric can be passed to any scipy clustering algorithm, via the metric keyword. For example, using linkage:

scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
1
  • @user2768102 No problem, and welcome to Stack Overflow! Small tip for better posts, you don't need to say "Thank you/please/Cheers" in the post as we like to cut down the signal-to-noise ratio.
    – Hooked
    Sep 12, 2013 at 13:47

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.