All Categories → Data Processing → information-retrieval

Top 148 information-retrieval open source projects

Conceptualsearch
Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
Aquiladb
Drop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.
Ranknet
My (slightly modified) Keras implementation of RankNet and PyTorch implementation of LambdaRank.
Pwnback
Burp Extender plugin that generates a sitemap of a website using Wayback Machine
Rank bm25
A Collection of BM25 Algorithms in Python
Openmatch
An Open-Source Package for Information Retrieval.
Neuralqa
NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
K Nrm
K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Bm25
A Python implementation of the BM25 ranking function.
Sf1r Lite
Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search
Terrier Core
Terrier IR Platform
Pyserini
Python interface to the Anserini IR toolkit built on Lucene
Tutorial Utilizing Kg
Resources for Tutorial on "Utilizing Knowledge Graphs in Text-centric Information Retrieval"
Entityduetneuralranking
Entity-Duet Neural Ranking Model
Easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Rated Ranking Evaluator
Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Foundry
The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Pytrec eval
pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
Vtext
Simple NLP in Rust with Python bindings
Ds2i
A library of inverted index data structures
Flexneuart
Flexible classic and NeurAl Retrieval Toolkit
Forte
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Solrplugins
Dice Solr Plugins from Simon Hughes Dice.com
Pyndri
pyndri is a Python interface to the Indri search engine.
Textrank Keyword Extraction
Keyword extraction using TextRank algorithm after pre-processing the text with lemmatization, filtering unwanted parts-of-speech and other techniques.
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Wordtokenizers.jl
High performance tokenizers for natural language processing and other related tasks
Scdv
Text classification with Sparse Composite Document Vectors.
Domain discovery tool
This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better understand a domain (or topic) as it is represented on the Web.
Nprf
NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval
Pke
Python Keyphrase Extraction module
Date Info
API to let user fetch the events that happen(ed) on a specific date
Drl4nlp.scratchpad
Notes on Deep Reinforcement Learning for Natural Language Processing papers
Fxt
A large scale feature extraction tool for text-based machine learning
Relevancyfeedback
Dice.com's relevancy feedback solr plugin created by Simon Hughes (Dice). Contains request handlers for doing MLT style recommendations, conceptual search, semantic search and personalized search
Awesome Neural Models For Semantic Match
A curated list of papers dedicated to neural text (semantic) matching.
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Anserini
A Lucene toolkit for replicable information retrieval research
Resin
Hardware-accelerated vector-based search engine. Available as a HTTP service or as an embedded library.
Deep Semantic Similarity Model
My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Pisa
PISA: Performant Indexes and Search for Academia
1-60 of 148 information-retrieval projects