All Projects → raphaelsty → cherche

raphaelsty / cherche

Licence: MIT License
📑 Neural Search

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to cherche

Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+1639.29%)
Mutual labels:  information-retrieval, question-answering, semantic-search, neural-search
pqlite
⚡ A fast embedded library for approximate nearest neighbor search
Stars: ✭ 141 (-28.06%)
Mutual labels:  information-retrieval, vector-search, neural-search
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (-69.39%)
Mutual labels:  retrieval, semantic-search, vector-search
Flexneuart
Flexible classic and NeurAl Retrieval Toolkit
Stars: ✭ 99 (-49.49%)
Mutual labels:  information-retrieval, question-answering
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+331.12%)
Mutual labels:  information-retrieval, question-answering
Bert Vietnamese Question Answering
Vietnamese question answering system with BERT
Stars: ✭ 57 (-70.92%)
Mutual labels:  information-retrieval, question-answering
cdQA-ui
⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.
Stars: ✭ 19 (-90.31%)
Mutual labels:  information-retrieval, question-answering
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (-64.29%)
Mutual labels:  information-retrieval, question-answering
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-36.73%)
Mutual labels:  information-retrieval, question-answering
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+10.2%)
Mutual labels:  information-retrieval, vector-search
ProQA
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
Stars: ✭ 44 (-77.55%)
Mutual labels:  information-retrieval, question-answering
Awesome Neural Models For Semantic Match
A curated list of papers dedicated to neural text (semantic) matching.
Stars: ✭ 669 (+241.33%)
Mutual labels:  information-retrieval, question-answering
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Stars: ✭ 500 (+155.1%)
Mutual labels:  information-retrieval, question-answering
SolrConfigExamples
Examples of Solr configuration entries for Solr plugins and Conceptual Search\Semantic Search from Simon Hughes Dice.com
Stars: ✭ 26 (-86.73%)
Mutual labels:  information-retrieval, semantic-search
COVID19-IRQA
No description or website provided.
Stars: ✭ 32 (-83.67%)
Mutual labels:  information-retrieval, question-answering
HAR
Code for WWW2019 paper "A Hierarchical Attention Retrieval Model for Healthcare Question Answering"
Stars: ✭ 22 (-88.78%)
Mutual labels:  information-retrieval, question-answering
awesome-semantic-search
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.
Stars: ✭ 161 (-17.86%)
Mutual labels:  information-retrieval, semantic-search
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (-4.08%)
Mutual labels:  information-retrieval, question-answering
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+276.53%)
Mutual labels:  information-retrieval, retrieval
FieldedSDM
Fielded Sequential Dependence Model (code and runs)
Stars: ✭ 32 (-83.67%)
Mutual labels:  information-retrieval, retrieval

Cherche

Neural search


documentation Demo license

Cherche allows creating a neural search pipeline using retrievers and pre-trained language models as rankers. We dedicated Cherche to small to medium-sized corpora. Cherche's main strength is its ability to build diverse and end-to-end pipelines.

Alt text

Installation 🤖

pip install cherche --upgrade 

To install the development version:

pip install git+https://github.com/raphaelsty/cherche

Documentation 📜

Documentation is available here. It provides details about retrievers, rankers, pipelines, question answering, summarization, and examples.

QuickStart 💨

Documents 📑

Cherche allows findings the right document within a list of objects. Here is an example of a corpus.

from cherche import data

documents = data.load_towns()

documents[:3]
[{'id': 0,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'Paris is the capital and most populous city of France.'},
 {'id': 1,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': "Since the 17th century, Paris has been one of Europe's major centres of science, and arts."},
 {'id': 2,
  'title': 'Paris',
  'url': 'https://en.wikipedia.org/wiki/Paris',
  'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France.'
  }]

Retriever ranker 🔍

Here is an example of a neural search pipeline composed of a TF-IDF that quickly retrieves documents, followed by a ranking model. The ranking model sorts the documents produced by the retriever based on the semantic similarity between the query and the documents.

from cherche import data, retrieve, rank
from sentence_transformers import SentenceTransformer

# List of dicts
documents = data.load_towns()

# Retrieve on fields title and article
retriever = retrieve.TfIdf(key="id", on=["title", "article"], documents=documents, k=30)

# Rank on fields title and article
ranker = rank.Encoder(
    key = "id",
    on = ["title", "article"],
    encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
    k = 3,
    path = "encoder.pkl"
)

# Pipeline creation
search = retriever + ranker

search.add(documents=documents)

search("Bordeaux")
[{'id': 57, 'similarity': 0.69513476},
 {'id': 63, 'similarity': 0.6214991},
 {'id': 65, 'similarity': 0.61809057}]

Map the index to the documents to access their contents.

search += documents
search("Bordeaux")
[{'id': 57,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': 'Bordeaux ( bor-DOH, French: [bɔʁdo] (listen); Gascon Occitan: Bordèu [buɾˈðɛw]) is a port city on the river Garonne in the Gironde department, Southwestern France.',
  'similarity': 0.69513476},
 {'id': 63,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': 'The term "Bordelais" may also refer to the city and its surrounding region.',
  'similarity': 0.6214991},
 {'id': 65,
  'title': 'Bordeaux',
  'url': 'https://en.wikipedia.org/wiki/Bordeaux',
  'article': "Bordeaux is a world capital of wine, with its castles and vineyards of the Bordeaux region that stand on the hillsides of the Gironde and is home to the world's main wine fair, Vinexpo.",
  'similarity': 0.61809057}]

Retrieve 👻

Cherche provides different retrievers that filter input documents based on a query.

  • retrieve.Elastic
  • retrieve.TfIdf
  • retrieve.Lunr
  • retrieve.BM25Okapi
  • retrieve.BM25L
  • retrieve.Flash
  • retrieve.Encoder
  • retrieve.DPR
  • retrieve.Fuzz

Rank 🤗

Cherche rankers are compatible with SentenceTransformers models, Hugging Face sentence similarity models, Hugging Face zero shot classification models, and of course with your own models.

Summarization and question answering

Cherche provides modules dedicated to summarization and question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines.

Translation

Hugging Face's translation models can be fully integrated into the neural search pipeline to translate queries, documents, or answers.

Deploy

We provide a minimalist API to deploy our neural search pipeline with FastAPI and Docker; information is available in the documentation.

Hugging Face Space

A running demo is available on Hugging Face.

Contributors 🤝

Cherche was created for/by Renault and is now available to all. We welcome all contributions.

Acknowledgements 👏

The BM25 models available in Cherche are wrappers around rank_bm25. Elastic retriever is a wrapper around Python Elasticsearch Client. TfIdf retriever is a wrapper around scikit-learn's TfidfVectorizer. Lunr retriever is a wrapper around Lunr.py. Flash retriever is a wrapper around FlashText. DPR and Encode rankers are wrappers dedicated to the use of the pre-trained models of SentenceTransformers in a neural search pipeline. ZeroShot ranker is a wrapper dedicated to the use of the zero-shot sequence classifiers of Hugging Face in a neural search pipeline.

See also 👀

Cherche is a minimalist solution and meets a need for modularity. Cherche is the way to go if we start with a list of documents as JSON with multiple fields to search on and create pipelines. Also, Cherche is well suited for middle-sized corpora.

Do not hesitate to look at Jina, Haystack, or TxtAi, which offer advanced neural search solutions.

Dev Team 💾

The Cherche dev team is made up of Raphaël Sourty, François-Paul Servant, Nicolas Bizzozzero, Jose G Moreno. 🥳

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].