md-experiments / elastic_transformers

Licence: Apache-2.0 license

Making BERT stretchy. Semantic Elasticsearch with Sentence Transformers

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to elastic transformers

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+2128.1%)

Mutual labels: transformers, semantic-search

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-85.62%)

Mutual labels: transformers, sentence-transformers

Introduction-to-Deep-Learning-and-Neural-Networks-Course

Code snippets and solutions for the Introduction to Deep Learning and Neural Networks Course hosted in educative.io

Stars: ✭ 33 (-78.43%)

Mutual labels: transformers

mlconjug3

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

Stars: ✭ 47 (-69.28%)

Mutual labels: nlp-machine-learning

code-transformer

Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Stars: ✭ 130 (-15.03%)

Mutual labels: transformers

ShortText-Fasttext

ShortText classification

Stars: ✭ 12 (-92.16%)

Mutual labels: nlp-machine-learning

Chinese-Minority-PLM

CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)

Stars: ✭ 133 (-13.07%)

Mutual labels: transformers

Naive-Bayes-Evening-Workshop

Companion code for Introduction to Python for Data Science: Coding the Naive Bayes Algorithm evening workshop

Stars: ✭ 23 (-84.97%)

Mutual labels: nlp-machine-learning

transformer generalization

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

Stars: ✭ 58 (-62.09%)

Mutual labels: transformers

lidtk

Language Identification Toolkit

Stars: ✭ 17 (-88.89%)

Mutual labels: nlp-machine-learning

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+49.67%)

Mutual labels: transformers

awesome-huggingface

🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.

Stars: ✭ 436 (+184.97%)

Mutual labels: transformers

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Stars: ✭ 102 (-33.33%)

Mutual labels: transformers

Conditional-SeqGAN-Tensorflow

Conditional Sequence Generative Adversarial Network trained with policy gradient, Implementation in Tensorflow

Stars: ✭ 47 (-69.28%)

Mutual labels: nlp-machine-learning

kex

Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.

Stars: ✭ 46 (-69.93%)

Mutual labels: nlp-machine-learning

text-classification-transformers

Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)

Stars: ✭ 32 (-79.08%)

Mutual labels: transformers

ginza-transformers

Use custom tokenizers in spacy-transformers

Stars: ✭ 15 (-90.2%)

Mutual labels: transformers

transformers-lightning

A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transformers.

Stars: ✭ 45 (-70.59%)

Mutual labels: transformers

BottleneckTransformers

Bottleneck Transformers for Visual Recognition

Stars: ✭ 231 (+50.98%)

Mutual labels: transformers

pysentimiento

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Stars: ✭ 274 (+79.08%)

Mutual labels: transformers

View All Similar Projects ➔

ElasticTransformers

Semantic Elasticsearch with Sentence Transformers. We will use the power of Elastic and the magic of BERT to index a million articles and perform lexical and semantic search on them.

The purpose is to provide an ease-of-use way of setting up your own Elasticsearch with near state of the art capabilities of contextual embeddings / semantic search using NLP transformers.

Overview

The above setup works as follows

Set up an Elasticsearch server with Dockers
Collect the dataset
Use sentence-transformers to index them onto Elastic (takes about 3 hrs on 4 CPU cores)
Look at some comparison examples between lexical and semantic search

Setup

Set up your environment

My environment is called et and I use conda for this. Navigate inside the project directory

conda create --name et python=3.7  
conda install -n et nb_conda_kernels
conda activate et
pip install -r requirements.txt

Get the data

For this tutorial I am using A Million News Headlines by Rohk and place it in the data folder inside the project dir.

    elastic_transformers/
    ├── data/

You will find that the steps are otherwise pretty abstracted so you can also do this with your dataset of choice

Elasticsearch with Docker

Follow the instructions on setting up Elastic with Docker from Elastic's page here For this tutorial, you only need to run the two steps:

Features

The repo introduces the ElasiticTransformers class. Utilities which help create, index and query Elasticsearch indices which include embeddings

Initiate the connection links as well as (optionally) the name of the index to work with

et=ElasticTransformers(url='http://localhost:9300',index_name='et-tiny')

create_index_spec define mapping for the index. Lists of relevant fields can be provided for keyword search or semantic (dense vector) search. It also has parameters for the size of the dense vector as those can vary create_index - uses the spec created earlier to create an index ready for search

et.create_index_spec(
    text_fields=['publish_date','headline_text'],
    dense_fields=['headline_text_embedding'],
    dense_fields_dim=768
)
et.create_index()

write_large_csv - breaks up a large csv file into chunks and iteratively uses a predefined embedding utility to create the embeddings list for each chunk and subsequently feed results to the index

et.write_large_csv('data/tiny_sample.csv',
                  chunksize=1000,
                  embedder=embed_wrapper,
                  field_to_embed='headline_text')

search - allows to select either keyword (‘match’ in Elastic) or semantic (dense in Elastic) search. Notably it requires the same embedding function used in write_large_csv

et.search(query='search these terms',
          field='headline_text',
          type='match',
          embedder=embed_wrapper, 
          size = 1000)

Usage

After successful setup, use the folling notebooks to make this all work

References

This repo combines together the following amazing works by brilliant people. Please check out their work if you haven't done so yet...

The ML part

The engineering part

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

md-experiments / elastic_transformers

Programming Languages

Labels

Projects that are alternatives of or similar to elastic transformers

ElasticTransformers

Overview

Setup

Set up your environment

Get the data

Elasticsearch with Docker

Features

Usage

References

The ML part

The engineering part