Highest scored 'sentence-similarity' questions

28 votes

3 answers

6k views

How to build semantic search for a given domain

There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles) Our data is just ...

Jickson

5,193

asked Feb 12, 2020 at 11:06

26 votes

2 answers

38k views

is there a way to check similarity between two full sentences in python?

I am making a project like this one here: https://www.youtube.com/watch?v=dovB8uSUUXE&feature=youtu.be but i am facing trouble because i need to check the similarity between the sentences for ...

Bemwa Malak

1,297

asked Dec 8, 2020 at 12:33

12 votes

2 answers

7k views

Sentence similarity using keras

I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My ...

lila

121

asked Sep 2, 2016 at 9:31

10 votes

1 answer

7k views

word2vec, sum or average word embeddings?

I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings. From the experiments ...

David Batista

3,104

asked May 9, 2015 at 16:23

9 votes

2 answers

4k views

Siamese Network with LSTM for sentence similarity in Keras gives periodically the same result

I'm a newbie in Keras and I'm trying to solve the task of sentence similairty using NN in Keras. I use word2vec as word embedding, and then a Siamese Network to prediction how similar two sentences ...

MiVe93

93

asked Sep 28, 2017 at 9:46

8 votes

4 answers

2k views

Sentence similarity models not capturing opposite sentences

I have tried different approaches to sentence similarity, namely: spaCy models: en_core_web_md and en_core_web_lg. Transformers: using the packages sentence-similarity and sentence-transformers, I'...

Diego Miguel

578

asked Sep 29, 2021 at 10:03

7 votes

5 answers

4k views

What is the best way to get accurate text similarity in python for comparing single words or bigrams?

I've got similar product data in both the products_a array and products_b array: products_a = [{color: "White", size: "2' 3\""}, {color: "Blue", size: "5' 8\&...

rom

596

asked Oct 3, 2021 at 4:39

7 votes

2 answers

4k views

How to determine if two sentences talk about similar topics?

I would like to ask you a question. Is there any algorithm/tool which can allow me to do some association between words? For example: I have the following group of sentences: (1) "My phone is ...

user12907213

asked Jul 29, 2020 at 16:11

4 votes

3 answers

12k views

Finding most similar sentences among all in python

Suggestions / refer links /codes are appreciated. I have a data which is having more than 1500 rows. Each row has a sentence. I am trying to find out the best method to find the most similar sentences ...

vivek

61

asked Sep 3, 2020 at 7:10

4 votes

1 answer

3k views

sentence transformer how to predict new example

I am exploring sentence transformers and came across this page. It shows how to train on our custom data. But I am not sure how to predict. If there are two new sentences such as 1) this is the third ...

user2543622

6,228

asked Jan 4, 2022 at 18:08

4 votes

1 answer

1k views

Use Spacy to find most similar sentences in doc

I'm looking for a solution to use something like most_similar() from Gensim but using Spacy. I want to find the most similar sentence in a list of sentences using NLP. I tried to use similarity() ...

Heraknos

373

asked May 15, 2019 at 13:33

4 votes

0 answers

326 views

Siamese BiLSTM neural network with Manhattan distance give very different similarity score each time for the same test data

I'm applying Siamese Bidirectional LSTM (BiLSTM) using character-level sequences and embeddings for long texts. The embeddings model is Word2vec, the sequence length is None to handle variable ...

MManahi

41

asked Jun 8, 2020 at 5:28

3 votes

4 answers

3k views

How to save a SetFit trainer locally after training

I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...

Tanish Bafna

33

asked Oct 12, 2022 at 18:23

3 votes

3 answers

1k views

String comparison with BERT seems to ignore "not" in sentence

I implemented a string comparison method using SentenceTransformers and BERT like following from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity ...

Tiago Bachiega de Almeida

121

asked Sep 7, 2021 at 16:18

3 votes

1 answer

7k views

fasttext pre trained sentences similarity

I want to use fasttext pre-trained models to compute similarity a sentence between a set of sentences. can anyone help me? what is the best approach? I computed the similarity between sentences by ...

mili lali

33

asked Dec 4, 2019 at 19:46

3 votes

1 answer

2k views

Does Euclidean Distance measure the semantic similarity?

I want to measure the similarity between sentences. Can I use sklearn and Euclidean Distance to measure the semantic similarity between sentences. I read about Cosine similarity also. Can someone ...

jenyK

71

asked Nov 11, 2018 at 8:57

3 votes

3 answers

5k views

Calculating words similarity score in python

I'm trying to calculate books similarity by comparing the topics lists. Need to get similarity score from the 2 lists between 0-1. Example: book1_topics = ["god", "bible", "...

Sapir

31

asked Apr 2, 2021 at 12:33

3 votes

2 answers

5k views

Bert fine-tuned for semantic similarity

I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark. I wonder ...

Chad

41

asked Dec 4, 2019 at 9:18

3 votes

1 answer

2k views

Gensim Doc2Vec most_similar() method not working as expected

I am struggling with Doc2Vec and I cannot see what I am doing wrong. I have a text file with sentences. I want to know, for a given sentence, what is the closest sentence we can find in that file. ...

Yann Droy

177

asked Apr 3, 2018 at 13:47

3 votes

1 answer

1k views

How to download and use the universal sentence encoder instead of loading it from url

I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model import tensorflow_hub as hub model = hub.load("https://tfhub.dev/google/...

Jithin P James

762

asked Jul 20, 2022 at 3:58

3 votes

1 answer

2k views

Using the a Universal Sentence Encoder Embedding Layer in Keras

I am trying to load USE as an embedding layer in my model using Keras. I used two approaches. the first one is adapted from the code here as follows: import tensorflow as tf tf.config....

Omnia

857

asked Dec 1, 2020 at 11:29

3 votes

2 answers

2k views

How can I use NLP to group multiple senteces by semantic similarity

I'm trying to increase the efficiency of a non-conformity management program. Basically, I have a database containing about a few hundred rows, each row describes a non-conformity using a text field. ...

Michael Longo

41

asked Jun 6, 2020 at 7:02

3 votes

2 answers

2k views

converting a sentence to an embedding representation

If I have a sentence, ex: “get out of here” And I want to use word2vec Embed. to represent it .. I found three different ways to do that: 1- for each word, we compute the AVG of its embedding vector, ...

Minions

5,318

asked Apr 4, 2018 at 17:04

3 votes

1 answer

164 views

Batched BM25 search in PySpark

I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...

theodre7

108

asked Jan 18 at 4:40

3 votes

1 answer

1k views

how to use sentence bert with transformers and torch

I would like to use sentence_transformers But due to policy restrictions I cannot install the package sentence-transformers I have transformers and torch package though. I went to this page and tried ...

user2543622

6,228

asked Oct 21, 2021 at 19:24

3 votes

0 answers

2k views

Text similarity as probability (between 0 and 1)

I have been trying to compute text similarity such that it'd be between 0 and 1, seen as a probability. The two text are encoded in two vectors, that are a bunch of numbers between [-1, 1]. So as two ...

inverted_index

2,742

asked Nov 16, 2020 at 2:15

3 votes

1 answer

1k views

How to perform efficient queries with Gensim doc2vec?

I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I ...

María Benavente

33

asked May 14, 2019 at 12:06

3 votes

2 answers

2k views

Finding most similar sentence match

I have a large dataset containing a mix of words and short phrases, such as: dataset = [ "car", "red-car", "lorry", "broken lorry", "truck owner", "train", ... ] I am ...

user9966656

asked Jun 20, 2018 at 15:28

3 votes

0 answers

1k views

spark similarities between text sentences

I'm trying to find similarity between text messages (about 1 million text message), in my implementation each line represents an entry. In order to calculate similarity between those texts we adopt ...

jamil

51

asked Nov 13, 2017 at 11:19

3 votes

2 answers

628 views

Extrapolate Sentence Similarity Given Word Similarities

Assuming that I have a word similarity score for each pair of words in two sentences, what is a decent approach to determining the overall sentence similarity from those scores? The word scores are ...

Scott Klarenbach

37.9k

asked Jan 27, 2015 at 4:31

2 votes

2 answers

2k views

Efficient way for Computing the Similarity of Multiple Documents using Spacy

I have around 10k docs (mostly 1-2 sentences) and want for each of these docs find the ten most simliar docs of a collection of 60k docs. Therefore, I want to use the spacy library. Due to the large ...

LaLeLo

137

asked Mar 23, 2022 at 11:51

2 votes

2 answers

1k views

How to access document details from Doc2Vec similarity scores in gensim model?

I have been given a doc2vec model using gensim which was trained on 20 Million documents. The 20 Million documents it was trained are also given to me but I have no idea how or which order the ...

User54211

121

asked Nov 20, 2017 at 6:28

2 votes

1 answer

307 views

Is this already a string similarity algorithm?

I'm unfamiliar with string similarity algorithms except for Levenshtein Distance because that's what I'm using and it has turned out to be less than ideal. So I've kind of got an idea of a recursive ...

MetaStack

3,434

asked May 23, 2016 at 23:09

2 votes

1 answer

69 views

What robust algorithm implementation can I use to perform phrase similarity with two inputs?

This is the problem: I have two columns in my matadata database "field name" and "field description" I need to check if the "field description" is actually a description ...

emichester

189

asked Nov 8, 2022 at 14:30

2 votes

1 answer

2k views

Is it possible to retrain Google's Universal Sentence Encoder such that it takes keywords into account when encoding sentences?

I am a bit confused on what it means to set trainable = True when loading the Universal Sentence Encoder 3. I have a small corpus (3000 different sentences), given a sentence I want to find the 10 ...

kspr

1,020

asked Oct 21, 2019 at 11:43

2 votes

1 answer

741 views

How to combine vectors generated by PV-DM and PV-DBOW methods of doc2vec?

I have around 20k documents with 60 - 150 words. Out of these 20K documents, there are 400 documents for which the similar document are known. These 400 documents serve as my test data. I am trying ...

Vikrant

139

asked Aug 6, 2019 at 10:05

2 votes

4 answers

1k views

How to find similar text in a large string?

I have a large string str and a needle ndl. Now, I need to find similar text of ndl from the string str. For example, SOURCE: "This is a demo text and I love you about this". NEEDLE: "I you ...

user373100

31

asked Oct 27, 2018 at 17:21

2 votes

1 answer

1k views

How to use my own sentence embeddings in Keras?

I am new to Keras and I created my own tf_idf sentence embeddings with shape (no_sentences, embedding_dim). I am trying to add this matrix as input to an LSTM layer. My network looks something like ...

andra

23

asked Oct 8, 2018 at 14:57

2 votes

1 answer

967 views

How can I add new words in wordnet dictionary?

I am trying to match two sentences and find similarities. Seems like some of the word (Noun) from my sentence are not present in wordnet dictionary. How can I add them in wordnet?

Binoy Gupta

21

asked Dec 27, 2017 at 6:27

2 votes

1 answer

3k views

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

I have sentence embedding output X of a sentence pair of dimension 2*1*300. I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = ...

Aarthi

23

asked Dec 3, 2017 at 8:21

2 votes

1 answer

604 views

elasticsearch ngram and postgresql trigram search results are not match

I've crereated an index on elasticsearch same as bellow: "settings" : { "number_of_shards": 1, "number_of_replicas": 0, "analysis": { "filter": { "...

Ahmet Erkan ÇELİK

2,382

asked Jul 17, 2017 at 11:53

2 votes

1 answer

88 views

Combine XML files based on entry similarity

I need to combine differently stuctured XML files using PHP. What I am doing is; Read first XML file using simplexml_load_file() Reformat the elements using a new structure using SimpleXMLElement() ...

Turab

182

asked Oct 25, 2016 at 15:25

2 votes

1 answer

109 views

String Similarity for all possible combination in Optimised fashion

I am facing a problem while finding string similarity. Scenario: The string which consisits of following fields first_name, middle_name and last_name What I have do is to find string similarity ...

Akhilesh mahajan

116

asked Jul 26, 2023 at 10:49

2 votes

1 answer

720 views

How to map word level timestamps to text of a given transcript?

I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...

paulpelikan

21

asked Jun 28, 2023 at 15:06

2 votes

0 answers

751 views

Transform TF universal-sentence-encoder to torch

Is there a way I can convert and use Google's universal-sentence-encoder (available through TF hub) in pytorch?

Maiia Bocharova

189

asked Dec 21, 2021 at 12:23

2 votes

2 answers

422 views

semantic similarity for mix of languages

I have a database of several thousands of utterances. Each record (utterance) is a text representing a problem description, which a user has submitted to a service desk. Sometimes also the service ...

Data Man

51

asked Dec 3, 2021 at 16:59

2 votes

1 answer

1k views

How to extract sentences which has similar meaning/intent compared against a example list of sentences

I have chat interaction [Utterances] between Customer and Advisor and would want to know if the advisor interactions contains particular sentences or similar sentences in the below list: Example ...

baskarmac

35

asked Apr 26, 2020 at 22:25

2 votes

1 answer

710 views

Cosine similarity is slow

I have a set of sentences, which is encoded using sentence encoder into vectors and I want to find out the most similar sentence to an incoming query. The search function looks as following: def ...

Jamik

75

asked Sep 29, 2019 at 7:51

1 vote

3 answers

6k views

Doc2Vec find the similar sentence

I am trying find similar sentence using doc2vec. What I am not able to find is actual sentence that is matching from the trained sentences. Below is the code from this article: from gensim.models....

Lolly

35.3k

asked Oct 2, 2019 at 17:39

1 vote

2 answers

2k views

How to speed up computing sentence similarity using spacy in Python?

I have the following code which takes in 2 sentences and return the similarity: nlp = spacy.load("en_core_web_md/en_core_web_md-3.2.0") def get_categories_nlp_sim(cat_1, cat_2): if (...

Tom

275

asked Apr 30, 2022 at 9:35

Collectives™ on Stack Overflow

Questions tagged [sentence-similarity]

Related Tags