Questions tagged [sentence-similarity]
Sentence similarity is a topic of Natural Language Processing that tries to find a semantic or syntactic matematical similarity between two or more sentences
sentence-similarity
230
questions
28
votes
3
answers
6k
views
How to build semantic search for a given domain
There is a problem we are trying to solve where we want to do a semantic search on our set of data,
i.e we have a domain-specific data (example: sentences talking about automobiles)
Our data is just ...
26
votes
2
answers
38k
views
is there a way to check similarity between two full sentences in python?
I am making a project like this one here:
https://www.youtube.com/watch?v=dovB8uSUUXE&feature=youtu.be
but i am facing trouble because i need to check the similarity between the sentences for ...
12
votes
2
answers
7k
views
Sentence similarity using keras
I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model.
My ...
10
votes
1
answer
7k
views
word2vec, sum or average word embeddings?
I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings.
From the experiments ...
9
votes
2
answers
4k
views
Siamese Network with LSTM for sentence similarity in Keras gives periodically the same result
I'm a newbie in Keras and I'm trying to solve the task of sentence similairty using NN in Keras.
I use word2vec as word embedding, and then a Siamese Network to prediction how similar two sentences ...
8
votes
4
answers
2k
views
Sentence similarity models not capturing opposite sentences
I have tried different approaches to sentence similarity, namely:
spaCy models: en_core_web_md and en_core_web_lg.
Transformers: using the packages sentence-similarity and sentence-transformers, I'...
7
votes
5
answers
4k
views
What is the best way to get accurate text similarity in python for comparing single words or bigrams?
I've got similar product data in both the products_a array and products_b array:
products_a = [{color: "White", size: "2' 3\""}, {color: "Blue", size: "5' 8\&...
7
votes
2
answers
4k
views
How to determine if two sentences talk about similar topics?
I would like to ask you a question. Is there any algorithm/tool which can allow me to do some association between words?
For example: I have the following group of sentences:
(1)
"My phone is ...
4
votes
3
answers
12k
views
Finding most similar sentences among all in python
Suggestions / refer links /codes are appreciated.
I have a data which is having more than 1500 rows. Each row has a sentence. I am trying to find out the best method to find the most similar sentences ...
4
votes
1
answer
3k
views
sentence transformer how to predict new example
I am exploring sentence transformers and came across this page.
It shows how to train on our custom data. But I am not sure how to predict. If there are two new sentences such as 1) this is the third ...
4
votes
1
answer
1k
views
Use Spacy to find most similar sentences in doc
I'm looking for a solution to use something like most_similar() from Gensim but using Spacy.
I want to find the most similar sentence in a list of sentences using NLP.
I tried to use similarity() ...
4
votes
0
answers
326
views
Siamese BiLSTM neural network with Manhattan distance give very different similarity score each time for the same test data
I'm applying Siamese Bidirectional LSTM (BiLSTM) using character-level sequences and embeddings for long texts. The embeddings model is Word2vec, the sequence length is None to handle variable ...
3
votes
4
answers
3k
views
How to save a SetFit trainer locally after training
I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...
3
votes
3
answers
1k
views
String comparison with BERT seems to ignore "not" in sentence
I implemented a string comparison method using SentenceTransformers and BERT like following
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
...
3
votes
1
answer
7k
views
fasttext pre trained sentences similarity
I want to use fasttext pre-trained models to compute similarity
a sentence between a set of sentences.
can anyone help me?
what is the best approach?
I computed the similarity between sentences by ...
3
votes
1
answer
2k
views
Does Euclidean Distance measure the semantic similarity?
I want to measure the similarity between sentences. Can I use sklearn and Euclidean Distance to measure the semantic similarity between sentences. I read about Cosine similarity also. Can someone ...
3
votes
3
answers
5k
views
Calculating words similarity score in python
I'm trying to calculate books similarity by comparing the topics lists.
Need to get similarity score from the 2 lists between 0-1.
Example:
book1_topics = ["god", "bible", "...
3
votes
2
answers
5k
views
Bert fine-tuned for semantic similarity
I would like to apply fine-tuning Bert to calculate semantic similarity between sentences.
I search a lot websites, but I almost not found downstream about this.
I just found STS benchmark.
I wonder ...
3
votes
1
answer
2k
views
Gensim Doc2Vec most_similar() method not working as expected
I am struggling with Doc2Vec and I cannot see what I am doing wrong.
I have a text file with sentences. I want to know, for a given sentence, what is the closest sentence we can find in that file.
...
3
votes
1
answer
1k
views
How to download and use the universal sentence encoder instead of loading it from url
I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model
import tensorflow_hub as hub
model = hub.load("https://tfhub.dev/google/...
3
votes
1
answer
2k
views
Using the a Universal Sentence Encoder Embedding Layer in Keras
I am trying to load USE as an embedding layer in my model using Keras. I used two approaches. the first one is adapted from the code here as follows:
import tensorflow as tf
tf.config....
3
votes
2
answers
2k
views
How can I use NLP to group multiple senteces by semantic similarity
I'm trying to increase the efficiency of a non-conformity management program. Basically, I have a database containing about a few hundred rows, each row describes a non-conformity using a text field.
...
3
votes
2
answers
2k
views
converting a sentence to an embedding representation
If I have a sentence, ex: “get out of here”
And I want to use word2vec Embed. to represent it .. I found three different ways to do that:
1- for each word, we compute the AVG of its embedding vector, ...
3
votes
1
answer
164
views
Batched BM25 search in PySpark
I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...
3
votes
1
answer
1k
views
how to use sentence bert with transformers and torch
I would like to use sentence_transformers
But due to policy restrictions I cannot install the package sentence-transformers
I have transformers and torch package though.
I went to this page and tried ...
3
votes
0
answers
2k
views
Text similarity as probability (between 0 and 1)
I have been trying to compute text similarity such that it'd be between 0 and 1, seen as a probability. The two text are encoded in two vectors, that are a bunch of numbers between [-1, 1]. So as two ...
3
votes
1
answer
1k
views
How to perform efficient queries with Gensim doc2vec?
I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I ...
3
votes
2
answers
2k
views
Finding most similar sentence match
I have a large dataset containing a mix of words and short phrases, such as:
dataset = [
"car",
"red-car",
"lorry",
"broken lorry",
"truck owner",
"train",
...
]
I am ...
3
votes
0
answers
1k
views
spark similarities between text sentences
I'm trying to find similarity between text messages (about 1 million text message), in my implementation each line represents an entry.
In order to calculate similarity between those texts we adopt ...
3
votes
2
answers
628
views
Extrapolate Sentence Similarity Given Word Similarities
Assuming that I have a word similarity score for each pair of words in two sentences, what is a decent approach to determining the overall sentence similarity from those scores?
The word scores are ...
2
votes
2
answers
2k
views
Efficient way for Computing the Similarity of Multiple Documents using Spacy
I have around 10k docs (mostly 1-2 sentences) and want for each of these docs find the ten most simliar docs of a collection of 60k docs. Therefore, I want to use the spacy library. Due to the large ...
2
votes
2
answers
1k
views
How to access document details from Doc2Vec similarity scores in gensim model?
I have been given a doc2vec model using gensim which was trained on 20 Million documents. The 20 Million documents it was trained are also given to me but I have no idea how or which order the ...
2
votes
1
answer
307
views
Is this already a string similarity algorithm?
I'm unfamiliar with string similarity algorithms except for Levenshtein Distance because that's what I'm using and it has turned out to be less than ideal.
So I've kind of got an idea of a recursive ...
2
votes
1
answer
69
views
What robust algorithm implementation can I use to perform phrase similarity with two inputs?
This is the problem:
I have two columns in my matadata database "field name" and "field description"
I need to check if the "field description" is actually a description ...
2
votes
1
answer
2k
views
Is it possible to retrain Google's Universal Sentence Encoder such that it takes keywords into account when encoding sentences?
I am a bit confused on what it means to set trainable = True when loading the Universal Sentence Encoder 3. I have a small corpus (3000 different sentences), given a sentence I want to find the 10 ...
2
votes
1
answer
741
views
How to combine vectors generated by PV-DM and PV-DBOW methods of doc2vec?
I have around 20k documents with 60 - 150 words. Out of these 20K documents, there are 400 documents for which the similar document are known. These 400 documents serve as my test data.
I am trying ...
2
votes
4
answers
1k
views
How to find similar text in a large string?
I have a large string str and a needle ndl. Now, I need to find similar text of ndl from the string str. For example,
SOURCE: "This is a demo text and I love you about this".
NEEDLE: "I you ...
2
votes
1
answer
1k
views
How to use my own sentence embeddings in Keras?
I am new to Keras and I created my own tf_idf sentence embeddings with shape (no_sentences, embedding_dim). I am trying to add this matrix as input to an LSTM layer. My network looks something like ...
2
votes
1
answer
967
views
How can I add new words in wordnet dictionary?
I am trying to match two sentences and find similarities.
Seems like some of the word (Noun) from my sentence are not present in wordnet dictionary. How can I add them in wordnet?
2
votes
1
answer
3k
views
Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output
I have sentence embedding output X of a sentence pair of dimension 2*1*300. I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product.
x = ...
2
votes
1
answer
604
views
elasticsearch ngram and postgresql trigram search results are not match
I've crereated an index on elasticsearch same as bellow:
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"...
2
votes
1
answer
88
views
Combine XML files based on entry similarity
I need to combine differently stuctured XML files using PHP. What I am doing is;
Read first XML file using simplexml_load_file()
Reformat the elements using a new structure using SimpleXMLElement() ...
2
votes
1
answer
109
views
String Similarity for all possible combination in Optimised fashion
I am facing a problem while finding string similarity.
Scenario: The string which consisits of following fields
first_name, middle_name and last_name
What I have do is to find string similarity ...
2
votes
1
answer
720
views
How to map word level timestamps to text of a given transcript?
I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...
2
votes
0
answers
751
views
Transform TF universal-sentence-encoder to torch
Is there a way I can convert and use Google's universal-sentence-encoder (available through TF hub) in pytorch?
2
votes
2
answers
422
views
semantic similarity for mix of languages
I have a database of several thousands of utterances. Each record (utterance) is a text representing a problem description, which a user has submitted to a service desk. Sometimes also the service ...
2
votes
1
answer
1k
views
How to extract sentences which has similar meaning/intent compared against a example list of sentences
I have chat interaction [Utterances] between Customer and Advisor and would want to know if the advisor interactions contains particular sentences or similar sentences in the below list:
Example ...
2
votes
1
answer
710
views
Cosine similarity is slow
I have a set of sentences, which is encoded using sentence encoder into vectors and I want to find out the most similar sentence to an incoming query.
The search function looks as following:
def ...
1
vote
3
answers
6k
views
Doc2Vec find the similar sentence
I am trying find similar sentence using doc2vec. What I am not able to find is actual sentence that is matching from the trained sentences.
Below is the code from this article:
from gensim.models....
1
vote
2
answers
2k
views
How to speed up computing sentence similarity using spacy in Python?
I have the following code which takes in 2 sentences and return the similarity:
nlp = spacy.load("en_core_web_md/en_core_web_md-3.2.0")
def get_categories_nlp_sim(cat_1, cat_2):
if (...