Awesome Semantic-Search
Logo made by @createdbytango.
Following repository aims to serve a meta-repository for Semantic Search and Semantic Similarity related tasks.
Semantic Search isn't limited to text! It can be done with images, speech, etc.There are numerous different use-cases and applications of semantic search.
Feel free to raise a PR on this repo!
Contents
Papers
2010
2014
2015
2016
- Bag of Tricks for Efficient Text Classification
π - Enriching Word Vectors with Subword Information
π - Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- On Approximately Searching for Similar Word Embeddings
- Learning Distributed Representations of Sentences from Unlabelled Data
π - Approximate Nearest Neighbor Search on High Dimensional Data --- Experiments, Analyses, and Improvement
2017
2018
- Universal Sentence Encoder
π - Learning Semantic Textual Similarity from Conversations
π - Google AI Blog: Advances in Semantic Textual Similarity
π - Optimization of Indexing Based on k-Nearest Neighbor Graph for Proximity Search in High-dimensional Data
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph
- The Case for Learned Index Structures
2019
- LASER: Language Agnostic Sentence Representations
π - Document Expansion by Query Prediction
π - Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
π - Multi-Stage Document Ranking with BERT
π - Latent Retrieval for Weakly Supervised Open Domain Question Answering
- End-to-End Open-Domain Question Answering with BERTserini
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining
π - Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
π· - DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
2020
- Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned
π - PASSAGE RE-RANKING WITH BERT
π - CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization
π - LaBSE:Language-agnostic BERT Sentence Embedding
π - Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
π - DeText: A deep NLP framework for intelligent text understanding
π - Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
π - Pretrained Transformers for Text Ranking: BERT and Beyond
π - REALM: Retrieval-Augmented Language Model Pre-Training
- ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
π - Improving Deep Learning For Airbnb Search
- Managing Diversity in Airbnb Search
π - Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
π - Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks
π· - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
π
2021
- Hybrid approach for semantic similarity calculation between Tamil words
π - Augmented SBERT
π - BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
π - Compatibility-aware Heterogeneous Visual Search
π· - Learning Personal Style from Few Examples
π· - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
π - A Survey of Transformers
π π· - High Quality Related Search Query Suggestions using Deep Reinforcement Learning
- Embedding-based Product Retrieval in Taobao Search
π π· - TPRM: A Topic-based Personalized Ranking Model for Web Search
π - mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset
π - Database Reasoning Over Text
π - How Does Adversarial Fine-Tuning Benefit BERT?)
π - Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
π - Primer: Searching for Efficient Transformers for Language Modeling
π - How Familiar Does That Sound? Cross-Lingual Representational
Similarity Analysis of Acoustic Word Embeddings
π - SimCSE: Simple Contrastive Learning of Sentence Embeddings
π - Compositional Attention: Disentangling Search and Retrieval
π π· - SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
2022
- Text and Code Embeddings by Contrastive Pre-Training
π - RELIC: Retrieving Evidence for Literary Claims
π - Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
π
Articles
- Tackling Semantic Search
- Semantic search in Azure Cognitive Search
- How we used semantic search to make our search 10x smarter
- Building a semantic search engine with dual space word embeddings
- Billion-scale semantic similarity search with FAISS+SBERT
- Some observations about similarity search thresholds
- Near Duplicate Image Search using Locality Sensitive Hashing
- Free Course on Vector Similarity Search and Faiss
- Comprehensive Guide To Approximate Nearest Neighbors Algorithms
Libraries and Tools
- fastText
- Universal Sentence Encoder
- SBERT
- ELECTRA
- LaBSE
- LASER
- Relevance AI - Vector Platform From Experimentation To Deployment
- Haystack
- Jina.AI
- pinecone
- SentEval Toolkit
- BEIR :Benchmarking IR
- RELiC: Retrieving Evidence for Literary Claims Dataset
- matchzoo-py
- deep_text_matching
- Which Frame?
- emoji semantic search
- PySerini
- BERTSerini
- BERTSimilarity
- milvus
- NeuroNLP++
- weaviate
- semantic-search-through-wikipedia-with-weaviate
- natural-language-youtube-search
- same.energy
- ann benchmarks
- scaNN
- REALM
- annoy
- pynndescent
- nsg
- FALCONN
- redis HNSW
- autofaiss
- DPR
- rank_BM25
- nearPy
- vearch
- PyNNDescent
- pgANN
- Tensorflow Similarity
- opensemanticsearch.org
- GPT3 Semantic Search
- searchy
- txtai
- HyperTag
- vectorai
- embeddinghub
- AquilaDb
Datasets
- Semantic Text Similarity Dataset Hub
- Facebook AI Image Similarity Challenge
- WIT : Wikipedia-based Image Text Dataset
Milestones
Have a look at the project board for the task list to contribute to any of the open issues.