Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → AdeDZY → K Nrm

AdeDZY / K Nrm

Licence: bsd-3-clause

K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-learning neural-network information-retrieval

Projects that are alternatives of or similar to K Nrm

Simple NLP in Rust with Python bindings

Stars: ✭ 108 (-40.98%)

Mutual labels: information-retrieval

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Stars: ✭ 13,379 (+7210.93%)

Mutual labels: information-retrieval

Topic Modelling for Humans

Stars: ✭ 12,763 (+6874.32%)

Mutual labels: information-retrieval

🏴‍☠️ Information Gathering tool 🏴‍☠️ DNS / Subdomains / Ports / Directories enumeration

Stars: ✭ 116 (-36.61%)

Mutual labels: information-retrieval

The Cognitive Foundry is an open-source Java library for building intelligent systems using machine learning

Stars: ✭ 124 (-32.24%)

Mutual labels: information-retrieval

Deep neural network to extract intelligent information from invoice documents.

Stars: ✭ 1,886 (+930.6%)

Mutual labels: information-retrieval

Semantic Entity Retrieval Toolkit

Stars: ✭ 100 (-45.36%)

Mutual labels: information-retrieval

Books worth spreading

Stars: ✭ 161 (-12.02%)

Mutual labels: information-retrieval

Rated Ranking Evaluator

Search Quality Evaluation Tool for Apache Solr & Elasticsearch search-based infrastructures

Stars: ✭ 134 (-26.78%)

Mutual labels: information-retrieval

Terrier IR Platform

Stars: ✭ 156 (-14.75%)

Mutual labels: information-retrieval

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+1762.84%)

Mutual labels: information-retrieval

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (-32.24%)

Mutual labels: information-retrieval

Tutorial Utilizing Kg

Resources for Tutorial on "Utilizing Knowledge Graphs in Text-centric Information Retrieval"

Stars: ✭ 148 (-19.13%)

Mutual labels: information-retrieval

pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval.

Stars: ✭ 114 (-37.7%)

Mutual labels: information-retrieval

Search Formula-1——A distributed high performance massive data engine for enterprise/vertical search

Stars: ✭ 158 (-13.66%)

Mutual labels: information-retrieval

A library of inverted index data structures

Stars: ✭ 104 (-43.17%)

Mutual labels: information-retrieval

Entityduetneuralranking

Entity-Duet Neural Ranking Model

Stars: ✭ 137 (-25.14%)

Mutual labels: information-retrieval

Learning to Rank in TensorFlow

Stars: ✭ 2,362 (+1190.71%)

Mutual labels: information-retrieval

A Python implementation of the BM25 ranking function.

Stars: ✭ 159 (-13.11%)

Mutual labels: information-retrieval

Python interface to the Anserini IR toolkit built on Lucene

Stars: ✭ 148 (-19.13%)

Mutual labels: information-retrieval

View All Similar Projects ➔

K-NRM

This is the implementation of the Kernel-based Neural Ranking Model (K-NRM) model from paper End-to-End Neural Ad-hoc Ranking with Kernel Pooling.

If you use this code for your scientific work, please cite it as (bibtex):

C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hoc ranking with kernel pooling. 
In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval. 
ACM. 2017.

Requirements

Tensorflow 0.12
Numpy
traitlets

Coming soon: K-NRM with Tensorflow 1.0

Guide To Use

Configure: first, configure the model through the config file. Configurable parameters are listed here

Training : pass the config file, training data and validation data as

python ./knrm/model/model_knrm.py config-file\
    --train \
    --train_file: path to training data\
    --validation_file: path to validation data\
    --train_size: size of training data (number of training samples)\
    --checkpoint_dir: directory to store/load model checkpoints\ 
    --load_model: True or False. Start with a new model or continue training

sample-train.sh

Testing: pass the config file and testing data as

python ./knrm/model/model_knrm.py config-file\
    --test \
    --test_file: path to testing data\
    --test_size: size of testing data (number of testing samples)\
    --checkpoint_dir: directory to load trained model\
    --output_score_file: file to output documents score\

Relevance scores will be output to output_score_file, one score per line, in the same order as test_file. We provide a script to convert scores into trec format.

./knrm/tools/gen_trec_from_score.py

Data Preperation

All queries and documents must be mapped into sequences of integer term ids. Term id starts with 1. -1 indicates OOV or non-existence. Term ids are sepereated by ,

Training Data Format

Each training sample is a tuple of (query, postive document, negative document)

query \t postive_document \t negative_document \t score_difference

Example: 177,705,632 \t 177,705,632,-1,2452,6,98 \t 177,705,632,3,25,14,37,2,146,159, -1 \t 0.119048

If score_difference < 0, the data generator will swap postive docment and negative document.

If score_difference < lickDataGenerator.min_score_diff, this training sample will be omitted.

We recommend shuffling the training samples to ease model convergence.

Testing Data Format

Each testing sample is a tuple of (query, document)

q \t document

Example: 177,705,632 \t 177,705,632,-1,2452,6,98

Configurations

Model Configurations

BaseNN.n_bins: number of kernels (soft bins) (default: 11. One exact match kernel and 10 soft kernels)
Knrm.lamb: defines the guassian kernels' sigma value. sigma = lamb * bin_size (default:0.5 -> sigma=0.1)
BaseNN.embedding_size: embedding dimension (default: 300)
BaseNN.max_q_len: max query length (default: 10)
BaseNN.max_d_len: max document length (default: 50)
DataGenerator.max_q_len: max query length. Should be the same as BaseNN.max_q_len (default: 10)
DataGenerator.max_d_len: max query length. Should be the same as BaseNN.max_d_len (default: 50)
BaseNN.vocabulary_size: vocabulary size.
DataGenerator.vocabulary_size: vocabulary size.

Data

Knrm.emb_in: initial embeddings
DataGenerator.min_score_diff: minimum score differences between postive documents and negative ones (default: 0)

Training Parameters

BaseNN.bath_size: batch size (default: 16)
BaseNN.max_epochs: max number of epochs to train
BaseNN.eval_frequency: evaluate model on validation set very this steps (default: 1000)
BaseNN.checkpoint_steps: save model very this steps (default: 10000)
Knrm.learning_rate: learning rate for Adam Opitmizer (default: 0.001)
Knrm.epsilon: epsilon for Adam Optimizer (default: 0.00001)

Efficiency

During training, it takes about 60ms to process one batch on a single-GPU machine with the following settings:

batch size: 16
max_q_len: 10
max_d_len: 50
vocabulary_size: 300K

Smaller vocabulary and shorter documents accelerate the training.

Click2Vec

We also provide the click2vec model as described in our paper.

./knrm/click2vec/generate_click_term_pair.py: generate <query_term, clicked_title_term> pairs
./knrm/click2vec/run_word2vec.sh: call Google's word2vec tool to train click2vec.

Cite the paper

If you use this code for your scientific work, please cite it as:

C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hoc ranking with kernel pooling. 
In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval. 
ACM. 2017.

@inproceedings{xiong2017neural,
  author          = {{Xiong}, Chenyan and {Dai}, Zhuyun and {Callan}, Jamie and {Liu}, Zhiyuan and {Power}, Russell},
  title           = "{End-to-End Neural Ad-hoc Ranking with Kernel Pooling}",
  booktitle       = {Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval},
  organization    = {ACM},
  year            = 2017,
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 183

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗