Huffon / Sentence Similarity
This repository contains various ways to calculate sentence vector similarity using NLP models
Stars: ✭ 182
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Sentence Similarity
Transformers.jl
Julia Implementation of Transformer models
Stars: ✭ 173 (-4.95%)
Mutual labels: natural-language-processing
Cookiecutter Spacy Fastapi
Cookiecutter API for creating Custom Skills for Azure Search using Python and Docker
Stars: ✭ 179 (-1.65%)
Mutual labels: natural-language-processing
Syfertext
A privacy preserving NLP framework
Stars: ✭ 170 (-6.59%)
Mutual labels: natural-language-processing
Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (-5.49%)
Mutual labels: natural-language-processing
Cleannlp
R package providing annotators and a normalized data model for natural language processing
Stars: ✭ 174 (-4.4%)
Mutual labels: natural-language-processing
Data Science Toolkit
Collection of stats, modeling, and data science tools in Python and R.
Stars: ✭ 169 (-7.14%)
Mutual labels: natural-language-processing
Deeptoxic
top 1% solution to toxic comment classification challenge on Kaggle.
Stars: ✭ 180 (-1.1%)
Mutual labels: natural-language-processing
Deep Math Machine Learning.ai
A blog which talks about machine learning, deep learning algorithms and the Math. and Machine learning algorithms written from scratch.
Stars: ✭ 173 (-4.95%)
Mutual labels: natural-language-processing
Cs224n 2019
My completed implementation solutions for CS224N 2019
Stars: ✭ 178 (-2.2%)
Mutual labels: natural-language-processing
Dive Into Dl Pytorch
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
Stars: ✭ 14,234 (+7720.88%)
Mutual labels: natural-language-processing
Knockknock
🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code
Stars: ✭ 2,304 (+1165.93%)
Mutual labels: natural-language-processing
Fastnlp
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Stars: ✭ 2,441 (+1241.21%)
Mutual labels: natural-language-processing
Efaqa Corpus Zh
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Stars: ✭ 170 (-6.59%)
Mutual labels: natural-language-processing
Stopwords
Default English stopword lists from many different sources
Stars: ✭ 179 (-1.65%)
Mutual labels: natural-language-processing
Open Sesame
A frame-semantic parsing system based on a softmax-margin SegRNN.
Stars: ✭ 170 (-6.59%)
Mutual labels: natural-language-processing
Web Database Analytics
Web scrapping and related analytics using Python tools
Stars: ✭ 175 (-3.85%)
Mutual labels: natural-language-processing
Kb Infobot
A dialogue bot for information access
Stars: ✭ 181 (-0.55%)
Mutual labels: natural-language-processing
Nlp profiler
A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (-0.55%)
Mutual labels: natural-language-processing
Sentence Similarity Calculator
This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).
And you can also choose the method to be used to get the similarity:
1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF
You can experiment with (The number of models) x (The number of methods) combinations!
Installation
- This project is developed under conda enviroment
- After cloning this repository, you can simply install all the dependent libraries described in
requirements.txt
withbash install.sh
conda create -n sensim python=3.7
conda activate sensim
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
bash install.sh
Usage
- To test your own sentences, you should fill out corpus.txt with sentences as below:
I ate an apple.
I went to the Apple.
I ate an orange.
...
- Then, choose the model and method to be used to calculate the similarity between source and target sentences
python sensim.py
--model MODEL_NAME [use, bert, elmo]
--method METHOD_NAME [cosine, manhattan, euclidean, inner,
ts-ss, angular, pairwise, pairwise-idf]
--verbose LOG_OPTION (bool)
Examples
- In this section, you can see the example result of
sentence-similarity
- As you know, there is a no silver-bullet which can calculate perfect similarity between sentences
- You should conduct various experiments with your dataset
-
Caution:
TS-SS score
might not fit with sentence similarity task, since this method originally devised to calculate the similarity between long documents
-
Caution:
- Result:
References
Papers
- Universal Sentence Encoder
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- BERTScore: Evaluating Text Generation with BERT
- A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering
Libraries
- TF-hub's Universal Sentence Encoder
- Allen NLP's ELMo
- Sentence Transformers
- BERTScore
- Vector Similarity
Articles
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].