All Questions
Tagged with euclidean-distance cluster-analysis 
            
            38
            questions
        
        
            21
            votes
        
        
            7
            answers
        
        
            36k
            views
        
    Multidimensional Euclidean Distance in Python
                I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy. 
Here is my code:
import numpy,scipy;
A=numpy.array([116.629, 7192.6, 4535....
            
        
       
    
            8
            votes
        
        
            1
            answer
        
        
            2k
            views
        
    Clustering in python(scipy) with space and time variables
                The format of my dataset:
[x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23.
My question now is how can I cluster this data when I need an euclidean distance metric for the ...
            
        
       
    
            6
            votes
        
        
            1
            answer
        
        
            6k
            views
        
    Weighted Euclidean Distance in R
                I'd like to create a distance-matrix with weighted euclidean distances from a data frame. The weights will be defined in a vector. Here's an example:
library("cluster")
a <- c(1,2,3,4,5)
b <- ...
            
        
       
    
            5
            votes
        
        
            1
            answer
        
        
            2k
            views
        
    What is the complexity of dist()?
                I used the dist function in R and I am wondering the time complexity of it.
I know that the hierarchical clustering has a N^2*logN time complexity. And hierarchical clustering is composed of two ...
            
        
       
    
            4
            votes
        
        
            2
            answers
        
        
            5k
            views
        
    Calculating a Voronoi diagram for planes in 3D
                Is there a code/library that can calculate a Voronoi diagram for planes (parallelograms) in 3D? I checked Qhull and it seems it can only work with points, in its examples Voro++ works with different ...
            
        
       
    
            4
            votes
        
        
            2
            answers
        
        
            5k
            views
        
    Can we cluster Multivariate Time Series dataset in Python
                I have a dataset with many financial signal values for different stocks at different times.For example
StockName  Date   Signal1  Signal2
----------------------------------
Stock1     1/1/20    a     ...
            
        
       
    
            3
            votes
        
        
            4
            answers
        
        
            7k
            views
        
    How to find most optimal number of clusters with K-Means clustering in Python
                I am new to clustering algorithms. I have a movie dataset with more than 200 movies and more than 100 users. All the users rated at least one movie. A value of 1 for good, 0 for bad and blank if the ...
            
        
       
    
            3
            votes
        
        
            1
            answer
        
        
            359
            views
        
    Clustering pictures by time and location
                I'm trying to cluster pictures according to the location they were taken and the time they were taken. My clustering algorithm requires me to define a distance function between every two points, (in ...
            
        
       
    
            2
            votes
        
        
            1
            answer
        
        
            13k
            views
        
    Implementing k-means with Euclidean distance vs Manhattan distance?
                I am implementing kmeans algorithm from scratch in python and on Spark. Actually, it is my homework. The problem is to implement kmeans with predefined centroids with different initialization methods, ...
            
        
       
    
            2
            votes
        
        
            1
            answer
        
        
            963
            views
        
    Clustering in Mixed Data Types
                Why can't we use the Eculidean Distance for Clustering of Categorical Variables and Why we use Gower Distance for the clustering of Categorical Variables. I am just looking for a simple logic and ...
            
        
       
    
            2
            votes
        
        
            4
            answers
        
        
            9k
            views
        
    How do I create a simliarity matrix in MATLAB?
                I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian ...
            
        
       
    
            2
            votes
        
        
            1
            answer
        
        
            628
            views
        
    Dimensionality reduction for high dimensional sparse data before clustering or spherical k-means?
                I am trying to build my first recommender system where i create a user feature space and then cluster them into different groups. Then for the recommendation to work for a particular user , first i ...
            
        
       
    
            2
            votes
        
        
            0
            answers
        
        
            557
            views
        
    How can I improve the silhouette score of my k-?means clustering
                I have a dataset with 18000 lines about some Customers, like this:
and I am trying to do some clustering using k-means algorithm.
Since I have both categorical and continuous variables I created some ...
            
        
       
    
            2
            votes
        
        
            1
            answer
        
        
            4k
            views
        
    Euclidean Distance or cosine similarity? [closed]
                I was reading 
Similarity Measure
and suddenly my whole world was falling apart. I have implemented a search engine using Clustering Technique. For Clustering , I used K Means which has distance ...
            
        
       
    
            1
            vote
        
        
            1
            answer
        
        
            226
            views
        
    Using norm Function In MATLAB
                I have a matrix of data which is the coordinates of some points and coordinates of 5 clusters
data = [randi(100,100,1),randi(100,100,1)];
x_Clusters = [20 5 12 88 61];
y_Clusters = [10 50 14 41 10];
...
            
        
       
    
            1
            vote
        
        
            0
            answers
        
        
            1k
            views
        
    Visualization on Cluster for Mixed Data
                So, i'm working with fuzzy clustering for Mixed data. Then i want to do Visualization for clustering result. 
Here is my data
> head(x)
x1 x2        x3       x4
A  C    8.461373 27.62996
B  C   10....
            
        
       
    
            0
            votes
        
        
            2
            answers
        
        
            322
            views
        
    Cluster Analysis: Problem finding Euclidean distances of centroids in a dataframe from origin
                The 7 columns for each row in df_centroids show the coordinates in a 7 dimensional space.
import numpy as np 
import pandas as pd 
import scipy
df_centroids
        0           1           2        ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            39
            views
        
    r average of distance by Id
                I have a dataset with two groups of subjects, Group A, Group B like this.
 Id  Group  Age
 1   A      17
 2   A      14
 3   A      10
 4   A      17
 5   A      12
 6   A      6
 7   A      18
 8   A ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            392
            views
        
    dist function in r (stats) for clustering: Should I put my ID variable in row.names?
                I have a data frame with some numeric columns and an ID column which is character. When I pass the whole data frame in the dist function it calculates the distance matrix, but when I remove the ID ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            2k
            views
        
    Python - Issue with the dimension of array in cdist function
                I am trying to find the right number of cluster for k-means and using the cdist function for this.
I can understand the argument for cdist should be of same dimension. I tried printing the size of ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            608
            views
        
    Does h2o.kmeans() make predictions based on euclidean distance?
                I created a clustering model using h2o.kmeans(). The modeling dataset was standardized by scale() in R first.
The model has five clusters and the coordinates of the centroids are:
CENTROID    X1  X2 ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            200
            views
        
    WEKA classes in map and reduce phases of KMeans Clustering on hadoop
                I want to use WEKA's classes inside mapreduce program for performing KMeans Clustering on Instances. I just want an overview for map and reduce classes. How the distance computed by WEKA classes be ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            582
            views
        
    Proper similarity measure for clustering
                I have problems in finding a proper similarity measure for clustering. I have around 3000 arrays of sets, where each set contains features of certain domain (e.g., number, color, days, alphabets, etc)....
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            151
            views
        
    Finding the "tightest" subset in Euclidean space
                I am given at of points x_1, x_2, ... x_n \in R^d. I wish to find a subset of k points such that the sum of the distances between these k points is minimal. Naively this is an O(n choose k) problem, ...
            
        
       
    
            0
            votes
        
        
            5
            answers
        
        
            1k
            views
        
    Partition neighbor points given a euclidean distance range
                Given two points P,Q and a delta, I defined the equivalence relation ~=, where P ~= Q if EuclideanDistance(P,Q) <= delta. Now, given a set S of n points, in the example S = (A, B, C, D, E, F) and n ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            6
            views
        
    What is the standard threshold value that is best for accuracy when employing Euclidean distance as a metric for gauging textual similarity?
                I'm using Euclidean distance as a metric to compare two sentences for similarity while clustering them using my custom incremental KMeans algorithm. The current threshold value I'm using is 0.7 which ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            94
            views
        
    Cannot find the Distance Matrix in Hierarchical Clustering
                I want to perform Hierarchical Clustering in this dataset (107721 rows and 16 columns). In order to do this I have to calculate the distance matrix. When I use the dist function I get the error:
...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            162
            views
        
    Memory Problem: Average-Linkage Clustering
                Data with a million rows and 18 columns need to be clustered using Average-Linkage Clustering, which in turn requires calculating the Euclidian distance between rows. While doing so, d <-dist(data),...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            409
            views
        
    Single linkage hierarchical clustering - boxplots on height of the branches to detect outliers
                before k-means clustering for consumer segmentation, I want to identify and delete outliers of my sample. I tried hierarchical clustering with single linkage algorithm. The problem is, I have a sample ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            252
            views
        
    How to Chunking large dissimilarity / distance matrices in R?
                I would like to cluster mix-type data that contains 50k rows and 10 features/columns. I am using R in my 64 bit PC. When I calculate dissimilarity / distance matrix with "daisy" function, I got "Error:...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            133
            views
        
    Finding Euclidean distance from a m*n dimensional matrix to a point
                I am working on a clustering problem. There's a situation where I have 3 cluster centers as below, and I want to calculate euclidean distance from these 3 cluster centers from another m*n dimensional ...
            
        
       
    
            0
            votes
        
        
            0
            answers
        
        
            166
            views
        
    Smart Semantic Category Clustering Using R
                Got 2 data frames, did the below:
library(tm)
v<- Corpus(VectorSource(as.vector(bothsources[,1])))
inspect(head(v,3))
v <- tm_map(v, removeWords, stopwords("english"))
v <- tm_map(v, ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            75
            views
        
    Clustering of data
                I have a 2-dimensional dataset with several points (say 100), each having x and y coordinate in MATLAB. I need to cluster these points around some predefined points (say 5) according to the nearest ...
            
        
       
    
            0
            votes
        
        
            2
            answers
        
        
            815
            views
        
    Error when using cluster package to compute euclidean distances
                I have been working on a text mining project. I have performed some LDA topic modelling and now I have my topic probabilities. I would like to use the cluster package so that I can get the euclidean ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            66
            views
        
    Assign clusters to locations, based on the euclidean distance
                I want to calculated to which cluster a point belongs, based on the euclidean distance.
clusters    xcor       ycor
1           64.99206   78.48413
2           1102.00000 2466.67500
3           1598....
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            91
            views
        
    Movement of clusters over time
                I am trying to do a cluster analysis, based on the transactional data for a financial product, and try and measure their movement over time.
I have my static cluster ready (based on the transactions ...
            
        
       
    
            0
            votes
        
        
            1
            answer
        
        
            38
            views
        
    whether i have to choose classification or clustering for my project?
                I just detected faces using Viola-jones algorithm. I cropped faces from frames(or video)and I made it as training set.In my video there are 5 different faces. I decided to use eigenfaces for face ...
            
        
       
    
            -2
            votes
        
        
            1
            answer
        
        
            1k
            views
        
    Dist and hclust functions outputting unexpected/incorrect outputs [closed]
                I have been attempting to use R as an alternative to MVSP for cluster analysis and PCA. However, R is giving drastically different outputs from MVSP using all the functions that I've found, including ...