JPQCIKM'21: JPQ substantially improves the efficiency of Dense Retrieval with 30x compression ratio, 10x CPU speedup and 2x GPU speedup.
Stars: ✭ 39 (-85.97%)
DRhardSIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.
Stars: ✭ 93 (-66.55%)
EasyocrReady-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Stars: ✭ 13,379 (+4712.59%)
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (-31.29%)
VtextSimple NLP in Rust with Python bindings
Stars: ✭ 108 (-61.15%)
PyseriniPython interface to the Anserini IR toolkit built on Lucene
Stars: ✭ 148 (-46.76%)
CatalystAccelerated deep learning R&D
Stars: ✭ 2,804 (+908.63%)
patzillaPatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
Stars: ✭ 71 (-74.46%)
NeuralqaNeuralQA: A Usable Library for Question Answering on Large Datasets with BERT
Stars: ✭ 185 (-33.45%)
Textrank Keyword ExtractionKeyword extraction using TextRank algorithm after pre-processing the text with lemmatization, filtering unwanted parts-of-speech and other techniques.
Stars: ✭ 79 (-71.58%)
GensimTopic Modelling for Humans
Stars: ✭ 12,763 (+4491.01%)
ConceptualsearchTrain a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs
Stars: ✭ 245 (-11.87%)
InvoicenetDeep neural network to extract intelligent information from invoice documents.
Stars: ✭ 1,886 (+578.42%)
query-wellformedness25,100 queries from the Paralex corpus (Fader et al., 2013) annotated with human ratings of whether they are well-formed natural language questions.
Stars: ✭ 80 (-71.22%)
FoundryThe Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning
Stars: ✭ 124 (-55.4%)
RanknetMy (slightly modified) Keras implementation of RankNet and PyTorch implementation of LambdaRank.
Stars: ✭ 211 (-24.1%)
Scilla🏴☠️ Information Gathering tool 🏴☠️ DNS / Subdomains / Ports / Directories enumeration
Stars: ✭ 116 (-58.27%)
ConvDRCode repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"
Stars: ✭ 36 (-87.05%)
SertSemantic Entity Retrieval Toolkit
Stars: ✭ 100 (-64.03%)
OpenmatchAn Open-Source Package for Information Retrieval.
Stars: ✭ 186 (-33.09%)
SolrpluginsDice Solr Plugins from Simon Hughes Dice.com
Stars: ✭ 86 (-69.06%)
FinBERT-QAFinancial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (-74.82%)
Wordtokenizers.jlHigh performance tokenizers for natural language processing and other related tasks
Stars: ✭ 63 (-77.34%)
RankingLearning to Rank in TensorFlow
Stars: ✭ 2,362 (+749.64%)
FreediscoveryWeb Service for E-Discovery Analytics
Stars: ✭ 59 (-78.78%)
Sf1r LiteSearch Formula-1——A distributed high performance massive data engine for enterprise/vertical search
Stars: ✭ 158 (-43.17%)
ComposeAEOfficial code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval
Stars: ✭ 49 (-82.37%)
ImageRetrievalContent Based Image Retrieval Techniques (e.g. knn, svm using MatLab GUI)
Stars: ✭ 51 (-81.65%)
Tutorial Utilizing KgResources for Tutorial on "Utilizing Knowledge Graphs in Text-centric Information Retrieval"
Stars: ✭ 148 (-46.76%)
TrinityTrinity IR Infrastructure
Stars: ✭ 227 (-18.35%)
Rated Ranking EvaluatorSearch Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures
Stars: ✭ 134 (-51.8%)
AquiladbDrop in solution for Decentralized Neural Information Retrieval. Index latent vectors along with JSON metadata and do efficient k-NN search.
Stars: ✭ 222 (-20.14%)
Dan Jurafsky Chris Manning NlpMy solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-55.4%)
gplPowerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (-22.3%)
Haystack🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+1126.26%)
PwnbackBurp Extender plugin that generates a sitemap of a website using Wayback Machine
Stars: ✭ 203 (-26.98%)
Pytrec evalpytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.
Stars: ✭ 114 (-58.99%)
LuceneTutorialA simple tutorial of Lucene for LIS 501 Introduction to Text Mining students at the University of Wisconsin-Madison (Fall 2021).
Stars: ✭ 62 (-77.7%)
Ds2iA library of inverted index data structures
Stars: ✭ 104 (-62.59%)
Rank bm25A Collection of BM25 Algorithms in Python
Stars: ✭ 187 (-32.73%)
FlexneuartFlexible classic and NeurAl Retrieval Toolkit
Stars: ✭ 99 (-64.39%)
pqlite⚡ A fast embedded library for approximate nearest neighbor search
Stars: ✭ 141 (-49.28%)
ForteForte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 89 (-67.99%)
Vec4irWord Embeddings for Information Retrieval
Stars: ✭ 188 (-32.37%)
Pyndripyndri is a Python interface to the Indri search engine.
Stars: ✭ 85 (-69.42%)
perkeA keyphrase extractor for Persian
Stars: ✭ 60 (-78.42%)
VectorsinsearchDice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (-74.46%)
K NrmK-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Stars: ✭ 183 (-34.17%)
GaanaapiUnofficial Gaana API
Stars: ✭ 59 (-78.78%)
IR-exercisesSolutions of the various test exams of the Information Retrieval course
Stars: ✭ 28 (-89.93%)
BooksBooks worth spreading
Stars: ✭ 161 (-42.09%)
beirA Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+165.47%)
solrApache Solr open-source search software
Stars: ✭ 651 (+134.17%)
crawlzoneCrawlzone is a fast asynchronous internet crawling framework for PHP.
Stars: ✭ 70 (-74.82%)