All Questions

Filter by
Sorted by
Tagged with
21 votes
7 answers
36k views

Multidimensional Euclidean Distance in Python

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy. Here is my code: import numpy,scipy; A=numpy.array([116.629, 7192.6, 4535....
garak's user avatar
  • 4,733
8 votes
1 answer
2k views

Clustering in python(scipy) with space and time variables

The format of my dataset: [x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23. My question now is how can I cluster this data when I need an euclidean distance metric for the ...
user2768102's user avatar
6 votes
1 answer
6k views

Weighted Euclidean Distance in R

I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example: library("cluster") a <- c(1,2,3,4,5) b <- ...
h7681's user avatar
  • 355
5 votes
1 answer
2k views

What is the complexity of dist()?

I used the dist function in R and I am wondering the time complexity of it. I know that the hierarchical clustering has a N^2*logN time complexity. And hierarchical clustering is composed of two ...
sclee1's user avatar
  • 1,217
4 votes
2 answers
5k views

Calculating a Voronoi diagram for planes in 3D

Is there a code/library that can calculate a Voronoi diagram for planes (parallelograms) in 3D? I checked Qhull and it seems it can only work with points, in its examples Voro++ works with different ...
zamazalotta's user avatar
4 votes
2 answers
5k views

Can we cluster Multivariate Time Series dataset in Python

I have a dataset with many financial signal values for different stocks at different times.For example StockName Date Signal1 Signal2 ---------------------------------- Stock1 1/1/20 a ...
Zhang Yongheng's user avatar
3 votes
4 answers
7k views

How to find most optimal number of clusters with K-Means clustering in Python

I am new to clustering algorithms. I have a movie dataset with more than 200 movies and more than 100 users. All the users rated at least one movie. A value of 1 for good, 0 for bad and blank if the ...
ToBeEXP's user avatar
  • 61
3 votes
1 answer
359 views

Clustering pictures by time and location

I'm trying to cluster pictures according to the location they were taken and the time they were taken. My clustering algorithm requires me to define a distance function between every two points, (in ...
user2606961's user avatar
2 votes
1 answer
13k views

Implementing k-means with Euclidean distance vs Manhattan distance?

I am implementing kmeans algorithm from scratch in python and on Spark. Actually, it is my homework. The problem is to implement kmeans with predefined centroids with different initialization methods, ...
mrasoolmirza's user avatar
2 votes
1 answer
963 views

Clustering in Mixed Data Types

Why can't we use the Eculidean Distance for Clustering of Categorical Variables and Why we use Gower Distance for the clustering of Categorical Variables. I am just looking for a simple logic and ...
Karan sehgal's user avatar
2 votes
4 answers
9k views

How do I create a simliarity matrix in MATLAB?

I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian ...
Vivek Subramanian's user avatar
2 votes
1 answer
628 views

Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?

I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i ...
rehan ali's user avatar
2 votes
0 answers
557 views

How can I improve the silhouette score of my k-?means clustering

I have a dataset with 18000 lines about some Customers, like this: and I am trying to do some clustering using k-means algorithm. Since I have both categorical and continuous variables I created some ...
Fábio Pires's user avatar
2 votes
1 answer
4k views

Euclidean Distance or cosine similarity? [closed]

I was reading Similarity Measure and suddenly my whole world was falling apart. I have implemented a search engine using Clustering Technique. For Clustering , I used K Means which has distance ...
Hooli's user avatar
  • 721
1 vote
1 answer
226 views

Using norm Function In MATLAB

I have a matrix of data which is the coordinates of some points and coordinates of 5 clusters data = [randi(100,100,1),randi(100,100,1)]; x_Clusters = [20 5 12 88 61]; y_Clusters = [10 50 14 41 10]; ...
MMd.NrC's user avatar
  • 91
1 vote
0 answers
1k views

Visualization on Cluster for Mixed Data

So, i'm working with fuzzy clustering for Mixed data. Then i want to do Visualization for clustering result. Here is my data > head(x) x1 x2 x3 x4 A C 8.461373 27.62996 B C 10....
Jack shephard's user avatar
0 votes
2 answers
322 views

Cluster Analysis: Problem finding Euclidean distances of centroids in a dataframe from origin

The 7 columns for each row in df_centroids show the coordinates in a 7 dimensional space. import numpy as np import pandas as pd import scipy df_centroids 0 1 2 ...
forever_learner's user avatar
0 votes
1 answer
39 views

r average of distance by Id

I have a dataset with two groups of subjects, Group A, Group B like this. Id Group Age 1 A 17 2 A 14 3 A 10 4 A 17 5 A 12 6 A 6 7 A 18 8 A ...
Ahir Bhairav Orai's user avatar
0 votes
1 answer
392 views

dist function in r (stats) for clustering: Should I put my ID variable in row.names?

I have a data frame with some numeric columns and an ID column which is character. When I pass the whole data frame in the dist function it calculates the distance matrix, but when I remove the ID ...
xhr489's user avatar
  • 2,089
0 votes
1 answer
2k views

Python - Issue with the dimension of array in cdist function

I am trying to find the right number of cluster for k-means and using the cdist function for this. I can understand the argument for cdist should be of same dimension. I tried printing the size of ...
Shivam Agrawal's user avatar
0 votes
1 answer
608 views

Does h2o.kmeans() make predictions based on euclidean distance?

I created a clustering model using h2o.kmeans(). The modeling dataset was standardized by scale() in R first. The model has five clusters and the coordinates of the centroids are: CENTROID X1 X2 ...
soniCYouth's user avatar
0 votes
1 answer
200 views

WEKA classes in map and reduce phases of KMeans Clustering on hadoop

I want to use WEKA's classes inside mapreduce program for performing KMeans Clustering on Instances. I just want an overview for map and reduce classes. How the distance computed by WEKA classes be ...
Navjot Grewal's user avatar
0 votes
1 answer
582 views

Proper similarity measure for clustering

I have problems in finding a proper similarity measure for clustering. I have around 3000 arrays of sets, where each set contains features of certain domain (e.g., number, color, days, alphabets, etc)....
Maggie's user avatar
  • 5,963
0 votes
1 answer
151 views

Finding the "tightest" subset in Euclidean space

I am given at of points x_1, x_2, ... x_n \in R^d. I wish to find a subset of k points such that the sum of the distances between these k points is minimal. Naively this is an O(n choose k) problem, ...
ualex's user avatar
  • 51
0 votes
5 answers
1k views

Partition neighbor points given a euclidean distance range

Given two points P,Q and a delta, I defined the equivalence relation ~=, where P ~= Q if EuclideanDistance(P,Q) <= delta. Now, given a set S of n points, in the example S = (A, B, C, D, E, F) and n ...
ceztko's user avatar
  • 14.9k
0 votes
0 answers
6 views

What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?

I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...
sanjay M's user avatar
0 votes
0 answers
94 views

Cannot find the Distance Matrix in Hierarchical Clustering

I want to perform Hierarchical Clustering in this dataset (107721 rows and 16 columns). In order to do this I have to calculate the distance matrix. When I use the dist function I get the error: ...
Billy's user avatar
  • 31
0 votes
1 answer
162 views

Memory Problem: Average-Linkage Clustering

Data with a million rows and 18 columns need to be clustered using Average-Linkage Clustering, which in turn requires calculating the Euclidian distance between rows. While doing so, d <-dist(data),...
Sachin's user avatar
  • 269
0 votes
1 answer
409 views

Single linkage hierarchical clustering - boxplots on height of the branches to detect outliers

before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample ...
A.dubia's user avatar
  • 13
0 votes
0 answers
252 views

How to Chunking large dissimilarity / distance matrices in R?

I would like to cluster mix-type data that contains 50k rows and 10 features/columns. I am using R in my 64 bit PC. When I calculate dissimilarity / distance matrix with "daisy" function, I got "Error:...
A. Bek's user avatar
  • 21
0 votes
1 answer
133 views

Finding Euclidean distance from a m*n dimensional matrix to a point

I am working on a clustering problem. There's a situation where I have 3 cluster centers as below, and I want to calculate euclidean distance from these 3 cluster centers from another m*n dimensional ...
Arindam Bose's user avatar
0 votes
0 answers
166 views

Smart Semantic Category Clustering Using R

Got 2 data frames, did the below: library(tm) v<- Corpus(VectorSource(as.vector(bothsources[,1]))) inspect(head(v,3)) v <- tm_map(v, removeWords, stopwords("english")) v <- tm_map(v, ...
Wiam Nasr's user avatar
0 votes
1 answer
75 views

Clustering of data

I have a 2-dimensional dataset with several points (say 100), each having x and y coordinate in MATLAB. I need to cluster these points around some predefined points (say 5) according to the nearest ...
Rutuja Kate's user avatar
0 votes
2 answers
815 views

Error when using cluster package to compute euclidean distances

I have been working on a text mining project. I have performed some LDA topic modelling and now I have my topic probabilities. I would like to use the cluster package so that I can get the euclidean ...
Ricardo's user avatar
  • 81
0 votes
1 answer
66 views

Assign clusters to locations, based on the euclidean distance

I want to calculated to which cluster a point belongs, based on the euclidean distance. clusters xcor ycor 1 64.99206 78.48413 2 1102.00000 2466.67500 3 1598....
Jelmer's user avatar
  • 351
0 votes
1 answer
91 views

Movement of clusters over time

I am trying to do a cluster analysis, based on the transactional data for a financial product, and try and measure their movement over time. I have my static cluster ready (based on the transactions ...
Kashika Saxena's user avatar
0 votes
1 answer
38 views

whether i have to choose classification or clustering for my project?

I just detected faces using Viola-jones algorithm. I cropped faces from frames(or video)and I made it as training set.In my video there are 5 different faces. I decided to use eigenfaces for face ...
user3456881's user avatar
-2 votes
1 answer
1k views

Dist and hclust functions outputting unexpected/incorrect outputs [closed]

I have been attempting to use R as an alternative to MVSP for cluster analysis and PCA. However, R is giving drastically different outputs from MVSP using all the functions that I've found, including ...
Mirran's user avatar
  • 11