All Projects → MartinoMensio → spacy-sentence-bert

MartinoMensio / spacy-sentence-bert

Licence: MIT license
Sentence transformers models for SpaCy

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to spacy-sentence-bert

DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-67.05%)
Mutual labels:  spacy, bert
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+738.64%)
Mutual labels:  bert, sentence-transformers
policy-data-analyzer
Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Stars: ✭ 22 (-75%)
Mutual labels:  bert, sentence-transformers
anonymisation
Anonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (-3.41%)
Mutual labels:  spacy, bert
bert-tensorflow-pytorch-spacy-conversion
Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.
Stars: ✭ 26 (-70.45%)
Mutual labels:  spacy, bert
spacy-universal-sentence-encoder
Google USE (Universal Sentence Encoder) for spaCy
Stars: ✭ 102 (+15.91%)
Mutual labels:  models, spacy
contextualSpellCheck
✔️Contextual word checker for better suggestions
Stars: ✭ 274 (+211.36%)
Mutual labels:  spacy, bert
Spacy Models
💫 Models for the spaCy Natural Language Processing (NLP) library
Stars: ✭ 796 (+804.55%)
Mutual labels:  models, spacy
Chainer Fast Neuralstyle Models
Models for the chainer fast neuralstyle
Stars: ✭ 132 (+50%)
Mutual labels:  models
Dynamic Training Bench
Simplify the training and tuning of Tensorflow models
Stars: ✭ 210 (+138.64%)
Mutual labels:  models
Pytorch Model Zoo
A collection of deep learning models implemented in PyTorch
Stars: ✭ 125 (+42.05%)
Mutual labels:  models
Tango
This repository is providing source codes of Tango projects I created.
Stars: ✭ 156 (+77.27%)
Mutual labels:  models
Hydro Serving
MLOps Platform
Stars: ✭ 213 (+142.05%)
Mutual labels:  models
Laravel Model Caching
Eloquent model-caching made easy.
Stars: ✭ 1,829 (+1978.41%)
Mutual labels:  models
Django
The Web framework for perfectionists with deadlines.
Stars: ✭ 61,277 (+69532.95%)
Mutual labels:  models
Teslaswift
Swift library to access the Tesla API
Stars: ✭ 117 (+32.95%)
Mutual labels:  models
Ardent
Self-validating, secure and smart models for Laravel's Eloquent ORM
Stars: ✭ 1,412 (+1504.55%)
Mutual labels:  models
Cool-NLPCV
Some Cool NLP and CV Repositories and Solutions (收集NLP中常见任务的开源解决方案、数据集、工具、学习资料等)
Stars: ✭ 143 (+62.5%)
Mutual labels:  bert
Django Colorfield
color field for django models with a nice color-picker in the admin. 🎨
Stars: ✭ 238 (+170.45%)
Mutual labels:  models
Insight
🔮 Easy access to model information for various model objects
Stars: ✭ 197 (+123.86%)
Mutual labels:  models

Tests Downloads Current Release Version pypi Version

Sentence-BERT for spaCy

This package wraps sentence-transformers (also known as sentence-BERT) directly in spaCy. You can substitute the vectors provided in any spaCy model with vectors that have been tuned specifically for semantic similarity.

The models below are suggested for analysing sentence similarity, as the STS benchmark indicates. Keep in mind that sentence-transformers are configured with a maximum sequence length of 128. Therefore for longer texts it may be more suitable to work with other models (e.g. Universal Sentence Encoder).

Install

Compatibility:

  • python 3.7/3.8/3.9/3.10
  • spaCy>=3.0.0,<4.0.0, last tested on version 3.5
  • sentence-transformers: tested on version 2.2.2

To install this package, you can run one of the following:

  • pip install spacy-sentence-bert
  • pip install git+https://github.com/MartinoMensio/spacy-sentence-bert.git

You can install standalone spaCy packages from GitHub with pip. If you install standalone packages, you will be able to load a language model directly by using the spacy.load API, without need to add a pipeline stage. This table takes the models listed on the Sentence Transformers documentation and shows some statistics along with the instruction to install the standalone models. If you don't want to install the standalone models, you can still use them by adding a pipeline stage (see below).

sentence-BERT name spacy model name dimensions language STS benchmark standalone install
paraphrase-distilroberta-base-v1 en_paraphrase_distilroberta_base_v1 768 en 81.81 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_paraphrase_distilroberta_base_v1-0.1.2.tar.gz#en_paraphrase_distilroberta_base_v1-0.1.2
paraphrase-xlm-r-multilingual-v1 xx_paraphrase_xlm_r_multilingual_v1 768 50+ 83.50 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_paraphrase_xlm_r_multilingual_v1-0.1.2.tar.gz#xx_paraphrase_xlm_r_multilingual_v1-0.1.2
stsb-roberta-large en_stsb_roberta_large 1024 en 86.39 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_large-0.1.2.tar.gz#en_stsb_roberta_large-0.1.2
stsb-roberta-base en_stsb_roberta_base 768 en 85.44 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_base-0.1.2.tar.gz#en_stsb_roberta_base-0.1.2
stsb-bert-large en_stsb_bert_large 1024 en 85.29 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_bert_large-0.1.2.tar.gz#en_stsb_bert_large-0.1.2
stsb-distilbert-base en_stsb_distilbert_base 768 en 85.16 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_distilbert_base-0.1.2.tar.gz#en_stsb_distilbert_base-0.1.2
stsb-bert-base en_stsb_bert_base 768 en 85.14 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_bert_base-0.1.2.tar.gz#en_stsb_bert_base-0.1.2
nli-bert-large en_nli_bert_large 1024 en 79.19 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large-0.1.2.tar.gz#en_nli_bert_large-0.1.2
nli-distilbert-base en_nli_distilbert_base 768 en 78.69 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_distilbert_base-0.1.2.tar.gz#en_nli_distilbert_base-0.1.2
nli-roberta-large en_nli_roberta_large 1024 en 78.69 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_roberta_large-0.1.2.tar.gz#en_nli_roberta_large-0.1.2
nli-bert-large-max-pooling en_nli_bert_large_max_pooling 1024 en 78.41 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large_max_pooling-0.1.2.tar.gz#en_nli_bert_large_max_pooling-0.1.2
nli-bert-large-cls-pooling en_nli_bert_large_cls_pooling 1024 en 78.29 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_large_cls_pooling-0.1.2.tar.gz#en_nli_bert_large_cls_pooling-0.1.2
nli-distilbert-base-max-pooling en_nli_distilbert_base_max_pooling 768 en 77.61 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_distilbert_base_max_pooling-0.1.2.tar.gz#en_nli_distilbert_base_max_pooling-0.1.2
nli-roberta-base en_nli_roberta_base 768 en 77.49 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_roberta_base-0.1.2.tar.gz#en_nli_roberta_base-0.1.2
nli-bert-base-max-pooling en_nli_bert_base_max_pooling 768 en 77.21 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base_max_pooling-0.1.2.tar.gz#en_nli_bert_base_max_pooling-0.1.2
nli-bert-base en_nli_bert_base 768 en 77.12 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base-0.1.2.tar.gz#en_nli_bert_base-0.1.2
nli-bert-base-cls-pooling en_nli_bert_base_cls_pooling 768 en 76.30 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nli_bert_base_cls_pooling-0.1.2.tar.gz#en_nli_bert_base_cls_pooling-0.1.2
average_word_embeddings_glove.6B.300d en_average_word_embeddings_glove.6B.300d 768 en 61.77 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_glove.6B.300d-0.1.2.tar.gz#en_average_word_embeddings_glove.6B.300d-0.1.2
average_word_embeddings_komninos en_average_word_embeddings_komninos 768 en 61.56 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_komninos-0.1.2.tar.gz#en_average_word_embeddings_komninos-0.1.2
average_word_embeddings_levy_dependency en_average_word_embeddings_levy_dependency 768 en 59.22 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_levy_dependency-0.1.2.tar.gz#en_average_word_embeddings_levy_dependency-0.1.2
average_word_embeddings_glove.840B.300d en_average_word_embeddings_glove.840B.300d 768 en 52.54 pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_average_word_embeddings_glove.840B.300d-0.1.2.tar.gz#en_average_word_embeddings_glove.840B.300d-0.1.2
quora-distilbert-base en_quora_distilbert_base 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_quora_distilbert_base-0.1.2.tar.gz#en_quora_distilbert_base-0.1.2
quora-distilbert-multilingual xx_quora_distilbert_multilingual 768 50+ N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_quora_distilbert_multilingual-0.1.2.tar.gz#xx_quora_distilbert_multilingual-0.1.2
msmarco-distilroberta-base-v2 en_msmarco_distilroberta_base_v2 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_distilroberta_base_v2-0.1.2.tar.gz#en_msmarco_distilroberta_base_v2-0.1.2
msmarco-roberta-base-v2 en_msmarco_roberta_base_v2 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_roberta_base_v2-0.1.2.tar.gz#en_msmarco_roberta_base_v2-0.1.2
msmarco-distilbert-base-v2 en_msmarco_distilbert_base_v2 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_msmarco_distilbert_base_v2-0.1.2.tar.gz#en_msmarco_distilbert_base_v2-0.1.2
nq-distilbert-base-v1 en_nq_distilbert_base_v1 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_nq_distilbert_base_v1-0.1.2.tar.gz#en_nq_distilbert_base_v1-0.1.2
distiluse-base-multilingual-cased-v2 xx_distiluse_base_multilingual_cased_v2 512 50+ N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_distiluse_base_multilingual_cased_v2-0.1.2.tar.gz#xx_distiluse_base_multilingual_cased_v2-0.1.2
stsb-xlm-r-multilingual xx_stsb_xlm_r_multilingual 768 50+ N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_stsb_xlm_r_multilingual-0.1.2.tar.gz#xx_stsb_xlm_r_multilingual-0.1.2
T-Systems-onsite/cross-en-de-roberta-sentence-transformer xx_cross_en_de_roberta_sentence_transformer 768 en,de N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_cross_en_de_roberta_sentence_transformer-0.1.2.tar.gz#xx_cross_en_de_roberta_sentence_transformer-0.1.2
LaBSE xx_LaBSE 768 109 N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/xx_LaBSE-0.1.2.tar.gz#xx_LaBSE-0.1.2
allenai-specter en_allenai_specter 768 en N/A pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_allenai_specter-0.1.2.tar.gz#en_allenai_specter-0.1.2

If your model is not in this list (e.g., xlm-r-base-en-ko-nli-ststb), you can still use it with this library but not as a standalone language. You will need to add a pipeline stage properly configured (see below the nlp.add_pipe API).

Usage

There are different ways to load the models of sentence-bert.

  • spacy.load API: you need to have installed one of the models from the table above
  • spacy_sentence_bert.load_model: you can load one of the models from the table above without having installed the standalone packages
  • nlp.add_pipe API: you can load any of the sentence-bert models on top of your nlp object

spacy.load API

Standalone model installed from GitHub (e.g., from the table above, pip install https://github.com/MartinoMensio/spacy-sentence-bert/releases/download/v0.1.2/en_stsb_roberta_large-0.1.2.tar.gz#en_stsb_roberta_large-0.1.2), you can load directly the model with the spaCy API:

import spacy
nlp = spacy.load('en_stsb_roberta_large')

spacy_sentence_bert.load_model API

You can obtain the same result without having to install the standalone model, by using this method:

import spacy_sentence_bert
nlp = spacy_sentence_bert.load_model('en_stsb_roberta_large')

nlp.add_pipe API

If you want to use one of the sentence embeddings over an existing Language object, you can use the nlp.add_pipe method. This also works if you want to use a language model that is not listed in the table above. Just make sure that sentence-transformers supports it.

import spacy
nlp = spacy.blank('en')
nlp.add_pipe('sentence_bert', config={'model_name': 'allenai-specter'})
nlp.pipe_names

The models, when first used, download sentence-BERT to the folder defined with TORCH_HOME in the environment variables (default ~/.cache/torch).

Once you have loaded the model, use it through the vector property and the similarity method of spaCy:

# get two documents
doc_1 = nlp('Hi there, how are you?')
doc_2 = nlp('Hello there, how are you doing today?')
# get the vector of the Doc, Span or Token
print(doc_1.vector.shape)
print(doc_1[3].vector.shape)
print(doc_1[2:4].vector.shape)
# or use the similarity method that is based on the vectors, on Doc, Span or Token
print(doc_1.similarity(doc_2[0:7]))

Utils

To build and upload

VERSION=0.1.2
# build the standalone models (17)
./build_models.sh
# build the archive at dist/spacy_sentence_bert-${VERSION}.tar.gz
python setup.py sdist
# upload to pypi
twine upload dist/spacy_sentence_bert-${VERSION}.tar.gz
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].