Highest scored 'k-means' questions

463 votes

8 answers

284k views

Cluster analysis in R: determine the optimal number of clusters

How can I choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be appropriate? How can I perform cluster dendro analysis? n = 1000 ...

user2153893

4,667

asked Mar 13, 2013 at 2:39

234 votes

11 answers

129k views

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

bmasc

2,470

asked Apr 3, 2011 at 12:39

154 votes

20 answers

126k views

How do I determine k when using k-means clustering?

I've been studying about k-means clustering, and one thing that's not clear is how you choose the value of k. Is it just a matter of trial and error, or is there more to it?

Jason Baker

195k

asked Nov 24, 2009 at 22:58

121 votes

3 answers

186k views

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in TensorFlow: http://learningtensorflow.com/lesson6/ and scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm ...

blue-sky

52.9k

asked Jan 10, 2017 at 11:37

60 votes

6 answers

3k views

Branchless K-means (or other optimizations)

Note: I'd appreciate more of a guide to how to approach and come up with these kinds of solutions rather than the solution itself. I have a very performance-critical function in my system showing up ...

user4842163

asked May 4, 2015 at 5:39

60 votes

18 answers

58k views

K-means algorithm variation with equal cluster size

I'm looking for the fastest algorithm for grouping points on a map into equally sized groups, by distance. The k-means clustering algorithm looks straightforward and promising, but does not produce ...

pixelistik

7,740

asked Mar 27, 2011 at 21:27

55 votes

3 answers

75k views

Scikit Learn - K-Means - Elbow - criterion

Today i'm trying to learn something about K-means. I Have understand the algorithm and i know how it works. Now i'm looking for the right k... I found the elbow criterion as a method to detect the ...

Linda

2,395

asked Oct 5, 2013 at 12:19

50 votes

7 answers

76k views

How to get the samples in each cluster?

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it? Say I had 100 data points and KMeans gave me 5 cluster....

user77005

1,819

asked Mar 24, 2016 at 7:56

49 votes

8 answers

91k views

Python k-means algorithm

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

Eeyore

2,126

asked Oct 9, 2009 at 19:16

48 votes

3 answers

47k views

Simple approach to assigning clusters for new data after k-means clustering

I'm running k-means clustering on a data frame df1, and I'm looking for a simple approach to computing the closest cluster center for each observation in a new data frame df2 (with the same variable ...

josliber

44.1k

asked Dec 16, 2013 at 21:27

46 votes

4 answers

38k views

kmeans: Quick-TRANSfer stage steps exceeded maximum

I am running k-means clustering in R on a dataset with 636,688 rows and 7 columns using the standard stats package: kmeans(dataset, centers = 100, nstart = 25, iter.max = 20). I get the following ...

Anna Dunietz

845

asked Jan 27, 2014 at 13:55

42 votes

7 answers

32k views

Kmeans without knowing the number of clusters? [duplicate]

I am attempting to apply k-means on a set of high-dimensional data points (about 50 dimensions) and was wondering if there are any implementations that find the optimal number of clusters. I ...

Legend

115k

asked Jul 7, 2011 at 18:58

41 votes

3 answers

34k views

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means ...

Anton Andreev

2,092

asked Mar 28, 2011 at 23:45

40 votes

2 answers

52k views

Calculating the percentage of variance measure for k-means?

On the Wikipedia page, an elbow method is described for determining the number of clusters in k-means. The built-in method of scipy provides an implementation but I am not sure I understand how the ...

Legend

115k

asked Jul 11, 2011 at 4:55

37 votes

2 answers

66k views

Will pandas dataframe object work with sklearn kmeans clustering?

dataset is pandas dataframe. This is sklearn.cluster.KMeans km = KMeans(n_clusters = n_Clusters) km.fit(dataset) prediction = km.predict(dataset) This is how I decide which entity belongs to ...

Dark Knight

869

asked Jan 19, 2015 at 2:17

36 votes

3 answers

36k views

What makes the distance measure in k-medoid "better" than k-means?

I am reading about the difference between k-means clustering and k-medoid clustering. Supposedly there is an advantage to using the pairwise distance measure in the k-medoid algorithm, instead of the ...

tumultous_rooster

12.3k

asked Feb 7, 2014 at 5:08

34 votes

1 answer

37k views

Cluster one-dimensional data optimally? [closed]

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works? Or: what is the most optimal way to do k-means clustering in one-dimension?

Laciel

367

asked Oct 23, 2011 at 22:12

32 votes

2 answers

42k views

Scikit-learn: How to run KMeans on a one-dimensional array?

I have an array of 13.876(13,876) values between 0 and 1. I would like to apply sklearn.cluster.KMeans to only this vector to find the different clusters in which the values are grouped. However, it ...

Irene

589

asked Feb 9, 2015 at 18:08

31 votes

5 answers

43k views

whats is the difference between "k means" and "fuzzy c means" objective functions?

I am trying to see if the performance of both can be compared based on the objective functions they work on?

n0ob

1,275

asked Feb 27, 2010 at 1:37

31 votes

3 answers

44k views

Understanding "score" returned by scikit-learn KMeans

I applied clustering on a set of text documents (about 100). I converted them to Tfidf vectors using TfIdfVectorizer and supplied the vectors as input to scikitlearn.cluster.KMeans(n_clusters=2, init='...

Prateek Dewan

1,601

asked Sep 3, 2015 at 8:23

30 votes

1 answer

20k views

Online k-means clustering

Is there a online version of the k-Means clustering algorithm? By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when ...

Theodor

5,606

asked Sep 13, 2010 at 7:33

28 votes

5 answers

95k views

Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving the following ...

Jonathan Rhein

1,685

asked Apr 7, 2016 at 7:40

27 votes

2 answers

54k views

What is the time complexity of k-means?

I was going through the k-means Wikipedia page. Based on the algorithm, I think the complexity is O(n*k*i) (n = total elements, k = number of cluster iteration) So can someone explain me this ...

parallel

313

asked Sep 5, 2013 at 10:41

27 votes

2 answers

22k views

Group n points in k clusters of equal size [duplicate]

Possible Duplicate: K-means algorithm variation with equal cluster size EDIT: like casperOne point it out to me this question is a duplicate. Anyways here is a more generalized question that ...

Pierre-David Belanger

1,024

asked Jan 9, 2012 at 23:30

26 votes

1 answer

42k views

Clustering text documents using scikit-learn kmeans in Python

I need to implement scikit-learn's kMeans for clustering text documents. The example code works fine as it is but takes some 20newsgroups data as input. I want to use the same code for clustering a ...

Nabila Shahid

419

asked Jan 11, 2015 at 17:20

26 votes

6 answers

18k views

Fast (< n^2) clustering algorithm

I have 1 million 5-dimensional points that I need to group into k clusters with k << 1 million. In each cluster, no two points should be too far apart (e.g. they could be bounding spheres with a ...

John Hawksley

261

asked Dec 9, 2010 at 23:11

26 votes

3 answers

36k views

Using K-means with cosine similarity - Python

I am trying to implement Kmeans algorithm in python which will use cosine distance instead of euclidean distance as distance metric. I understand that using different distance function can be fatal ...

ise372

261

asked Sep 25, 2017 at 16:22

26 votes

2 answers

4k views

Estimation of number of Clusters via gap statistics and prediction strength

I am trying to translate the R implementations of gap statistics and prediction strength http://edchedch.wordpress.com/2011/03/19/counting-clusters/ into python scripts for the estimation of number of ...

Riyaz

1,450

asked Jan 8, 2014 at 17:39

25 votes

3 answers

72k views

kmeans scatter plot: plot different colors per cluster

I am trying to do a scatter plot of a kmeans output which clusters sentences of the same topic together. The problem i am facing is plotting points that belongs to each cluster a certain color. ...

jxn

7,865

asked Jan 30, 2015 at 0:36

25 votes

2 answers

25k views

K-Means: Lloyd,Forgy,MacQueen,Hartigan-Wong

I'm working with the K-Means Algorithm in R and I want to figure out the differences of the 4 Algorithms Lloyd,Forgy,MacQueen and Hartigan-Wong which are available for the function "kmeans" in the ...

user2974776

301

asked Dec 7, 2013 at 20:11

24 votes

11 answers

124k views

setting an array element with a sequence requested array has an inhomogeneous shape after 1 dimensions The detected shape was (2,)+inhomogeneous part

import os import numpy as np from scipy.signal import * import csv import matplotlib.pyplot as plt from scipy import signal from brainflow.board_shim import BoardShim, BrainFlowInputParams, LogLevels,...

ILovePhysics

381

asked Apr 20, 2021 at 17:17

24 votes

5 answers

33k views

Changes of clustering results after each time run in Python scikit-learn

I have a bunch of sentences and I want to cluster them using scikit-learn spectral clustering. I've run the code and get the results with no problem. But, every time I run it I get different results. ...

user3430235

419

asked Sep 18, 2014 at 20:28

23 votes

2 answers

16k views

How does pytorch backprop through argmax?

I'm building Kmeans in pytorch using gradient descent on centroid locations, instead of expectation-maximisation. Loss is the sum of square distances of each point to its nearest centroid. To ...

jammygrams

378

asked Mar 3, 2019 at 14:06

22 votes

7 answers

30k views

Can k-means clustering do classification?

I want to know whether the k-means clustering algorithm can do classification? If I have done a simple k-means clustering . Assume I have many data , I use k-means clusterings, then get 2 clusters A,...

Sirius Wang

339

asked Mar 10, 2014 at 13:00

22 votes

6 answers

25k views

scikit-learn: Finding the features that contribute to each KMeans cluster

Say you have 10 features you are using to create 3 clusters. Is there a way to see the level of contribution each of the features have for each of the clusters? What I want to be able to say is that ...

cmgerber

2,229

asked Dec 15, 2014 at 19:01

21 votes

4 answers

38k views

ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (inclusive) when using silhouette_score

I am trying to calculate silhouette score as I find the optimal number of clusters to create, but get an error that says: ValueError: Number of labels is 1. Valid values are 2 to n_samples - 1 (...

Suhail Gupta

22.8k

asked Jul 17, 2018 at 13:10

21 votes

3 answers

19k views

How would I implement k-means with TensorFlow?

The intro tutorial, which uses the built-in gradient descent optimizer, makes a lot of sense. However, k-means isn't just something I can plug into gradient descent. It seems like I'd have to write my ...

Raphie Palefsky-Smith

515

asked Nov 10, 2015 at 2:03

21 votes

5 answers

38k views

How can I perform K-means clustering on time series data?

How can I do K-means clustering of time series data? I understand how this works when the input data is a set of points, but I don't know how to cluster a time series with 1XM, where M is the data ...

Jaz

591

asked Aug 17, 2010 at 14:44

20 votes

1 answer

29k views

How to add k-means predicted clusters in a column to a dataframe in Python

I have a question about kmeans clustering in python. So I did the analysis that way: from sklearn.cluster import KMeans km = KMeans(n_clusters=12, random_state=1) new = data._get_numeric_data()....

Keithx

3,066

asked Jul 14, 2016 at 10:48

19 votes

3 answers

33k views

plot a document tfidf 2D graph

I would like to plot a 2d graph with the x-axis as term and y-axis as TFIDF score (or document id) for my list of sentences. I used scikit learn's fit_transform() to get the scipy matrix but i do not ...

jxn

7,865

asked Jan 26, 2015 at 23:00

19 votes

2 answers

38k views

Clustering geo location coordinates (lat,long pairs) using KMeans algorithm with Python

Using the following code to cluster geolocation coordinates results in 3 clusters: import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2, whiten ...

rokpoto.com

9,959

asked Jul 15, 2014 at 15:38

19 votes

5 answers

25k views

How to calculate BIC for k-means clustering in R

I've been using k-means to cluster my data in R but I'd like to be able to assess the fit vs. model complexity of my clustering using Baysiean Information Criterion (BIC) and AIC. Currently the code I'...

UnivStudent

402

asked Apr 5, 2013 at 17:19

18 votes

3 answers

36k views

OpenCV using k-means to posterize an image

I want to posterize an image with k-means and OpenCV in C++ interface (cv namespace) and I get weird results. I need it for reduce some noise. This is my code: #include "cv.h" #include "...

nkint

11.7k

asked Mar 5, 2012 at 23:22

17 votes

2 answers

38k views

KMeans clustering in PySpark

I have a spark dataframe 'mydataframe' with many columns. I am trying to run kmeans on only two columns: lat and long (latitude & longitude) using them as simple values). I want to extract 7 ...

user3245256

1,918

asked Dec 1, 2017 at 2:22

17 votes

4 answers

23k views

Can I use K-means algorithm on a string?

I am working on a python project where I study RNA structure evolution (represented as a string for example: "(((...)))" where the parenthesis represent basepairs). The point being is that I have an ...

Doni

173

asked Jun 9, 2011 at 13:36

17 votes

2 answers

21k views

How to set k-Means clustering labels from highest to lowest with Python?

I have a dataset of 38 apartments and their electricity consumption in the morning, afternoon and evening. I am trying to clusterize this dataset using the k-Means implementation from scikit-learn, ...

Sergio

377

asked Jul 3, 2017 at 14:41

17 votes

2 answers

60k views

How to identify Cluster labels in kmeans scikit learn

I am learning python scikit. The example given here displays the top occurring words in each Cluster and not Cluster name. http://scikit-learn.org/stable/auto_examples/document_clustering.html I ...

vij555

349

asked Feb 5, 2015 at 13:00

16 votes

1 answer

67k views

How to use silhouette score in k-means clustering from sklearn library?

I'd like to use silhouette score in my script, to automatically compute number of clusters in k-means clustering from sklearn. import numpy as np import pandas as pd import csv from sklearn.cluster ...

Jessica Martini

253

asked Jul 2, 2018 at 14:40

16 votes

1 answer

36k views

initial centroids for scikit-learn kmeans clustering

if I already have a numpy array that can serve as the initial centroids, how can I properly initialize the kmeans algorithm? I am using the scikit-learn Kmeans class this post (k-means with selected ...

webmaker

476

asked Jul 13, 2016 at 14:54

16 votes

2 answers

3k views

How to detect multiple objects with OpenCV in C++?

I got inspiration from this answer here, which is a Python implementation, but I need C++, that answer works very well, I got the thought is that: detectAndCompute to get keypoints, use kmeans to ...

Suge

2,857

asked Sep 20, 2018 at 12:37

Collectives™ on Stack Overflow

Questions tagged [k-means]

Related Tags