All Questions
Tagged with euclidean-distance cluster-analysis
38
questions
21
votes
7
answers
36k
views
Multidimensional Euclidean Distance in Python
I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.
Here is my code:
import numpy,scipy;
A=numpy.array([116.629, 7192.6, 4535....
8
votes
1
answer
2k
views
Clustering in python(scipy) with space and time variables
The format of my dataset:
[x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23.
My question now is how can I cluster this data when I need an euclidean distance metric for the ...
6
votes
1
answer
6k
views
Weighted Euclidean Distance in R
I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example:
library("cluster")
a <- c(1,2,3,4,5)
b <- ...
5
votes
1
answer
2k
views
What is the complexity of dist()?
I used the dist function in R and I am wondering the time complexity of it.
I know that the hierarchical clustering has a N^2*logN time complexity. And hierarchical clustering is composed of two ...
4
votes
2
answers
5k
views
Calculating a Voronoi diagram for planes in 3D
Is there a code/library that can calculate a Voronoi diagram for planes (parallelograms) in 3D? I checked Qhull and it seems it can only work with points, in its examples Voro++ works with different ...
4
votes
2
answers
5k
views
Can we cluster Multivariate Time Series dataset in Python
I have a dataset with many financial signal values for different stocks at different times.For example
StockName Date Signal1 Signal2
----------------------------------
Stock1 1/1/20 a ...
3
votes
4
answers
7k
views
How to find most optimal number of clusters with K-Means clustering in Python
I am new to clustering algorithms. I have a movie dataset with more than 200 movies and more than 100 users. All the users rated at least one movie. A value of 1 for good, 0 for bad and blank if the ...
3
votes
1
answer
359
views
Clustering pictures by time and location
I'm trying to cluster pictures according to the location they were taken and the time they were taken. My clustering algorithm requires me to define a distance function between every two points, (in ...
2
votes
1
answer
13k
views
Implementing k-means with Euclidean distance vs Manhattan distance?
I am implementing kmeans algorithm from scratch in python and on Spark. Actually, it is my homework. The problem is to implement kmeans with predefined centroids with different initialization methods, ...
2
votes
1
answer
963
views
Clustering in Mixed Data Types
Why can't we use the Eculidean Distance for Clustering of Categorical Variables and Why we use Gower Distance for the clustering of Categorical Variables. I am just looking for a simple logic and ...
2
votes
4
answers
9k
views
How do I create a simliarity matrix in MATLAB?
I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian ...
2
votes
1
answer
628
views
Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?
I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i ...
2
votes
0
answers
557
views
How can I improve the silhouette score of my k-?means clustering
I have a dataset with 18000 lines about some Customers, like this:
and I am trying to do some clustering using k-means algorithm.
Since I have both categorical and continuous variables I created some ...
2
votes
1
answer
4k
views
Euclidean Distance or cosine similarity? [closed]
I was reading
Similarity Measure
and suddenly my whole world was falling apart. I have implemented a search engine using Clustering Technique. For Clustering , I used K Means which has distance ...
1
vote
1
answer
226
views
Using norm Function In MATLAB
I have a matrix of data which is the coordinates of some points and coordinates of 5 clusters
data = [randi(100,100,1),randi(100,100,1)];
x_Clusters = [20 5 12 88 61];
y_Clusters = [10 50 14 41 10];
...
1
vote
0
answers
1k
views
Visualization on Cluster for Mixed Data
So, i'm working with fuzzy clustering for Mixed data. Then i want to do Visualization for clustering result.
Here is my data
> head(x)
x1 x2 x3 x4
A C 8.461373 27.62996
B C 10....
0
votes
2
answers
322
views
Cluster Analysis: Problem finding Euclidean distances of centroids in a dataframe from origin
The 7 columns for each row in df_centroids show the coordinates in a 7 dimensional space.
import numpy as np
import pandas as pd
import scipy
df_centroids
0 1 2 ...
0
votes
1
answer
39
views
r average of distance by Id
I have a dataset with two groups of subjects, Group A, Group B like this.
Id Group Age
1 A 17
2 A 14
3 A 10
4 A 17
5 A 12
6 A 6
7 A 18
8 A ...
0
votes
1
answer
392
views
dist function in r (stats) for clustering: Should I put my ID variable in row.names?
I have a data frame with some numeric columns and an ID column which is character. When I pass the whole data frame in the dist function it calculates the distance matrix, but when I remove the ID ...
0
votes
1
answer
2k
views
Python - Issue with the dimension of array in cdist function
I am trying to find the right number of cluster for k-means and using the cdist function for this.
I can understand the argument for cdist should be of same dimension. I tried printing the size of ...
0
votes
1
answer
608
views
Does h2o.kmeans() make predictions based on euclidean distance?
I created a clustering model using h2o.kmeans(). The modeling dataset was standardized by scale() in R first.
The model has five clusters and the coordinates of the centroids are:
CENTROID X1 X2 ...
0
votes
1
answer
200
views
WEKA classes in map and reduce phases of KMeans Clustering on hadoop
I want to use WEKA's classes inside mapreduce program for performing KMeans Clustering on Instances. I just want an overview for map and reduce classes. How the distance computed by WEKA classes be ...
0
votes
1
answer
582
views
Proper similarity measure for clustering
I have problems in finding a proper similarity measure for clustering. I have around 3000 arrays of sets, where each set contains features of certain domain (e.g., number, color, days, alphabets, etc)....
0
votes
1
answer
151
views
Finding the "tightest" subset in Euclidean space
I am given at of points x_1, x_2, ... x_n \in R^d. I wish to find a subset of k points such that the sum of the distances between these k points is minimal. Naively this is an O(n choose k) problem, ...
0
votes
5
answers
1k
views
Partition neighbor points given a euclidean distance range
Given two points P,Q and a delta, I defined the equivalence relation ~=, where P ~= Q if EuclideanDistance(P,Q) <= delta. Now, given a set S of n points, in the example S = (A, B, C, D, E, F) and n ...
0
votes
0
answers
6
views
What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?
I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...
0
votes
0
answers
94
views
Cannot find the Distance Matrix in Hierarchical Clustering
I want to perform Hierarchical Clustering in this dataset (107721 rows and 16 columns). In order to do this I have to calculate the distance matrix. When I use the dist function I get the error:
...
0
votes
1
answer
162
views
Memory Problem: Average-Linkage Clustering
Data with a million rows and 18 columns need to be clustered using Average-Linkage Clustering, which in turn requires calculating the Euclidian distance between rows. While doing so, d <-dist(data),...
0
votes
1
answer
409
views
Single linkage hierarchical clustering - boxplots on height of the branches to detect outliers
before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample ...
0
votes
0
answers
252
views
How to Chunking large dissimilarity / distance matrices in R?
I would like to cluster mix-type data that contains 50k rows and 10 features/columns. I am using R in my 64 bit PC. When I calculate dissimilarity / distance matrix with "daisy" function, I got "Error:...
0
votes
1
answer
133
views
Finding Euclidean distance from a m*n dimensional matrix to a point
I am working on a clustering problem. There's a situation where I have 3 cluster centers as below, and I want to calculate euclidean distance from these 3 cluster centers from another m*n dimensional ...
0
votes
0
answers
166
views
Smart Semantic Category Clustering Using R
Got 2 data frames, did the below:
library(tm)
v<- Corpus(VectorSource(as.vector(bothsources[,1])))
inspect(head(v,3))
v <- tm_map(v, removeWords, stopwords("english"))
v <- tm_map(v, ...
0
votes
1
answer
75
views
Clustering of data
I have a 2-dimensional dataset with several points (say 100), each having x and y coordinate in MATLAB. I need to cluster these points around some predefined points (say 5) according to the nearest ...
0
votes
2
answers
815
views
Error when using cluster package to compute euclidean distances
I have been working on a text mining project. I have performed some LDA topic modelling and now I have my topic probabilities. I would like to use the cluster package so that I can get the euclidean ...
0
votes
1
answer
66
views
Assign clusters to locations, based on the euclidean distance
I want to calculated to which cluster a point belongs, based on the euclidean distance.
clusters xcor ycor
1 64.99206 78.48413
2 1102.00000 2466.67500
3 1598....
0
votes
1
answer
91
views
Movement of clusters over time
I am trying to do a cluster analysis, based on the transactional data for a financial product, and try and measure their movement over time.
I have my static cluster ready (based on the transactions ...
0
votes
1
answer
38
views
whether i have to choose classification or clustering for my project?
I just detected faces using Viola-jones algorithm. I cropped faces from frames(or video)and I made it as training set.In my video there are 5 different faces. I decided to use eigenfaces for face ...
-2
votes
1
answer
1k
views
Dist and hclust functions outputting unexpected/incorrect outputs [closed]
I have been attempting to use R as an alternative to MVSP for cluster analysis and PCA. However, R is giving drastically different outputs from MVSP using all the functions that I've found, including ...