All Questions

Filter by
Sorted by
Tagged with
44 votes
2 answers
21k views

Compare similarity algorithms

I want to use string similarity functions to find corrupted data in my database. I came upon several of them: Jaro, Jaro-Winkler, Levenshtein, Euclidean and Q-gram, I wanted to know what is ...
Ali's user avatar
  • 818
5 votes
1 answer
3k views

Best way to identify dissimilarity: Euclidean Distance, Cosine Distance, or Simple Subtraction?

I'm new to data science and am currently learning different techniques that I can do with Python. Currently, I'm trying it out with Spotify's API for my own playlists. The goal is to find the most ...
Mustafa's user avatar
  • 337
5 votes
2 answers
3k views

r distance between rows

I apologize this is my attempt at redeeming myself after a disastrous earlier attempt . Now I have a bit more clarity. So here I go again. My goal is to find rows that are similar. So first I am ...
Emily Fassbender's user avatar
4 votes
1 answer
2k views

Calculating similarity based on attributes

My objective is to calculate the degree of similarity between two users based on their attributes. For instance let's consider a player and consider age, salary, and points as attributes. Also I ...
user1010101's user avatar
  • 2,088
3 votes
3 answers
8k views

measuring similarity between two rgb images in python

I have two rgb images of same size, and I would like to compute a similarity metric. I thought of starting out with euclidean distance: import scipy.spatial.distance as dist import cv2 im1 = cv2....
HappyPy's user avatar
  • 10.3k
2 votes
4 answers
9k views

How do I create a simliarity matrix in MATLAB?

I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian ...
Vivek Subramanian's user avatar
1 vote
3 answers
919 views

Find euclidean distance of two array of different length

I want to find Euclidean distance to check similarity of strings. From above image in a painting object field there are many image types in database. Images is displaying using this paining_object ...
Komal Goyani's user avatar
1 vote
1 answer
1k views

Extract distances after running scipy.spatial.distance.pdist

I have a Pandas data frame (see small example below). I want to calculate Euclidean distances between observations (rows) based on their values in 3 columns (features). I am using scipy.spatial....
user3245256's user avatar
  • 1,918
1 vote
1 answer
926 views

I just started to use Eigen Matrix algebra library and aim to create a similarity matrix of a dataset, suggestions?

I try to create a similarity matrix with eigen library on a dataset. I just read the csv file into eigen matrix but know as a matlab customer I am looking for something like bsxfun or something to ...
erogol's user avatar
  • 13.4k
1 vote
1 answer
625 views

Javascript Clusterfck Metric

So I am converting an old data visualization to a new platform and I am a little bit stuck on their community sorting feature. In the original code, it looks like the author uses agglomerative ...
1080p's user avatar
  • 255
1 vote
1 answer
2k views

Pearson vs Euclidean vs Manhattan Results

Using Python 3.6. I am not getting logical results when using Manhattan distance for similarity measurement. Even comparing to the results from Pearson and Euclidean correlation, the units for ...
user1940212's user avatar
1 vote
1 answer
1k views

Finding most similar items by euclidean and cosine

How do I go about finding similarities in R? In particular, the similarity metrics I care most about are cosine and a KNN-# value. I guess the key aspect of this is so that the data comes out in a ...
runningbirds's user avatar
  • 6,425
1 vote
2 answers
1k views

Correctly interpreting Cosine Angular Distance Similarity & Euclidean Distance Similarity

As an example, let's say I have a very simple data set. I am given a csv with three columns, user_id, book_id, rating. The rating can be any number 0-5, where 0 means the user has NOT rated the book. ...
Wendell Blatt's user avatar
1 vote
2 answers
362 views

Euclidian distance between posts based on tags

I am playing with the euclidian distance example from programming collective intelligence book, # Returns a distance-based similarity score for person1 and person2 def sim_distance(prefs,person1,...
Hamza Yerlikaya's user avatar
1 vote
1 answer
183 views

How to convert TS-SS result to similarity measure between 0 - 1?

I'm currently developing a question plugin for some LMS that auto grade the answer based on the similarity between the answer and answer key with cosine similarity. But lately, I found that there is a ...
newtocoding's user avatar
1 vote
0 answers
36 views

Why does the result of ItemSimilarityJob lack some similarities of itemId-pair?

Given that I have the following ratings.csv userId,itemId,rating 1,1,1 1,2,2 1,3,3 2,2,4 2,3,2 2,5,4 2,6,5 3,1,5 3,3,1 3,6,2 4,4,4 Using org.apache.mahout.cf.taste.hadoop.item.RecommenderJob, we have ...
Lavender Lee's user avatar
1 vote
0 answers
139 views

How to calc the similarity of two images

I'm trying to examine two images for similarity with the usage of SIFT. The result should be a percentage. I have understood how to extract the features and descriptors from the images using OpenCV ...
kopsman's user avatar
  • 67
1 vote
1 answer
2k views

euclidean distance and similarity

My teacher has given me these set of questions as homework and I don't know if I'm understanding it right. The following customers have rated a number of DVD's as shown in the table. Calculate the ...
Ovaflow's user avatar
  • 111
1 vote
0 answers
63 views

How to find similar wiki pages with n-gram?

Let's suppose there's a wiki, and for every wiki page I'd like to show a widget - with the list of similar pages. It could be done in two steps: Step 1 - convert each page into feature vector with ...
Alexey Petrushin's user avatar
1 vote
1 answer
782 views

Using relative frequency for euclidean distance

How do I calculate the euclidean distance(similarity) between two documents eg D1 and D2 using relative frequency?. Below is an example of both cosine and euclidean distance between two documents ...
user avatar
1 vote
2 answers
225 views

Similarity of documents function

I am trying to create matrices for cosine and euclidean distances of a document. not too sure how I would approach this question. Any advice would be appreciated. Thanks. The function takes the ...
nickp's user avatar
  • 43
0 votes
1 answer
160 views

How to go from a vector to a similarity matrix?

I would like to reconstruct a similarity matrix between two vectors from a vector containing the similarity between each pair of elements in the two vectors. Does anyone know how I could do it? To ...
ben's user avatar
  • 277
0 votes
1 answer
2k views

How to change the code to find the euclidean distance (not cosine) between words in a word2vec impementation?

The following code when run gives the cosine distance between two words. model.wv.distance('word1','word2') How do I find the euclidean distance between two words? I am using gensim for word2vec ...
user avatar
0 votes
1 answer
567 views

Measuring the distance between two relative frequency vectors

I am having a problem in choosing a adequate distance function to measure the similarity (dissimilarity) between two relative frequency vectors. More specifically, I am using shape feature vectors ...
peterS's user avatar
  • 71
0 votes
1 answer
582 views

Proper similarity measure for clustering

I have problems in finding a proper similarity measure for clustering. I have around 3000 arrays of sets, where each set contains features of certain domain (e.g., number, color, days, alphabets, etc)....
Maggie's user avatar
  • 5,963
0 votes
1 answer
2k views

Find the most similar row to user input from pandas dataframe

I want to find the most similar row to user input from my dataset. My dataset looks like this: And This is the user input : I used scipy and sklearn with a lot of distance metrics (euclidean, ...
Dhayf OTHMEN's user avatar
0 votes
0 answers
149 views

Coefficient of Euclidean Distance

I have been trying to calculate correlation coefficient (say r) and euclidean distance (say d) between two random variables X and Y. It is known that -1 <= r <= 1, whereas d >= 0. To compare ...
Alemu's user avatar
  • 9
0 votes
0 answers
34 views

How can I look for similarities across an entire python dataframe?

Suppose I have the following dataframe: FG% FT% 3P% Player A .56 .80 .45 Player B .22 .60 .20 Player C .48 .71 .39 etc... I'd like to iterate over each row (player) to find out ...
AdamA's user avatar
  • 343
0 votes
0 answers
816 views

Is there any package in R to use jaccard or cosine distance for k-medoid clustering?

I am using function pam in package cluster for partitioning around medoids. pam(x, k, diss = inherits(x, "dist"), metric = "euclidean", medoids = NULL, stand = FALSE, cluster.only = FALSE, ...
Hadij's user avatar
  • 4,082
0 votes
0 answers
43 views

nested transformations apache spark

I need some help about my code, which doesn't response. I need to compute similarities between items mutually based on their ratings, these similarities will be used to construct the similarity matrix....
Ferdaous Hd's user avatar
0 votes
1 answer
3k views

Detecting a black/blank frame in video using OpenCV

I'm using OpenCV 2.4.2 VideoCapture class to grab frames from multiple videos and my aim is to compare the frames between videos to retrieve similar videos (visually similar). I'm facing two issues. ...
Uni's user avatar
  • 45
-1 votes
1 answer
463 views

Item Based Similarity Metric

I am using Mahout Apache to write an item based recommender (based on similar item ratings by users) and I was wondering which of the following two similarity metrics would be the best to use: ...
tlauer's user avatar
  • 568