All Projects → allenai → Bi Att Flow

allenai / Bi Att Flow

Licence: apache-2.0
Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to Bi Att Flow

question-answering
No description or website provided.
Stars: ✭ 32 (-97.83%)
Mutual labels:  question-answering, squad, bidaf
Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-98.64%)
Mutual labels:  question-answering, squad, bidaf
co-attention
Pytorch implementation of "Dynamic Coattention Networks For Question Answering"
Stars: ✭ 54 (-96.33%)
Mutual labels:  question-answering, squad
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+131.59%)
Mutual labels:  question-answering, squad
extractive rc by runtime mt
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Stars: ✭ 36 (-97.55%)
Mutual labels:  question-answering, squad
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (-92.26%)
Mutual labels:  question-answering, squad
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (-97.89%)
Mutual labels:  question-answering, squad
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-98.51%)
Mutual labels:  question-answering, squad
qa
TensorFlow Models for the Stanford Question Answering Dataset
Stars: ✭ 72 (-95.11%)
Mutual labels:  question-answering, squad
Awesome Qa
😎 A curated list of the Question Answering (QA)
Stars: ✭ 596 (-59.51%)
Mutual labels:  question-answering, squad
Soqal
Arabic Open Domain Question Answering System using Neural Reading Comprehension
Stars: ✭ 72 (-95.11%)
Mutual labels:  question-answering
Chinesenlp
Datasets, SOTA results of every fields of Chinese NLP
Stars: ✭ 1,206 (-18.07%)
Mutual labels:  question-answering
Happy Transformer
A package built on top of Hugging Face's transformer library that makes it easy to utilize state-of-the-art NLP models
Stars: ✭ 97 (-93.41%)
Mutual labels:  question-answering
Reading Comprehension Question Answering Papers
Survey on Machine Reading Comprehension
Stars: ✭ 101 (-93.14%)
Mutual labels:  question-answering
Farm
🏡 Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
Stars: ✭ 1,140 (-22.55%)
Mutual labels:  question-answering
Sentence Similarity
PyTorch implementations of various deep learning models for paraphrase detection, semantic similarity, and textual entailment
Stars: ✭ 96 (-93.48%)
Mutual labels:  question-answering
Wsdm2018 hyperqa
Reference Implementation for WSDM 2018 Paper "Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering"
Stars: ✭ 66 (-95.52%)
Mutual labels:  question-answering
Medical Question Answer Data
Medical question and answer dataset gathered from the web.
Stars: ✭ 65 (-95.58%)
Mutual labels:  question-answering
Php Interview Best Practices In China
📙 PHP 面试知识点汇总
Stars: ✭ 1,133 (-23.03%)
Mutual labels:  question-answering
Chatbot
Русскоязычный чатбот
Stars: ✭ 106 (-92.8%)
Mutual labels:  question-answering

Bi-directional Attention Flow for Machine Comprehension

0. Requirements

General

  • Python (verified on 3.5.2. Issues have been reported with Python 2!)
  • unzip, wget (for running download.sh only)

Python Packages

  • tensorflow (deep learning library, only works on r0.11)
  • nltk (NLP tools, verified on 3.2.1)
  • tqdm (progress bar, verified on 4.7.4)
  • jinja2 (for visaulization; if you only train and test, not needed)

1. Pre-processing

First, prepare data. Donwload SQuAD data and GloVe and nltk corpus (~850 MB, this will download files to $HOME/data):

chmod +x download.sh; ./download.sh

Second, Preprocess Stanford QA dataset (along with GloVe vectors) and save them in $PWD/data/squad (~5 minutes):

python -m squad.prepro

2. Training

The model has ~2.5M parameters. The model was trained with NVidia Titan X (Pascal Architecture, 2016). The model requires at least 12GB of GPU RAM. If your GPU RAM is smaller than 12GB, you can either decrease batch size (performance might degrade), or you can use multi GPU (see below). The training converges at ~18k steps, and it took ~4s per step (i.e. ~20 hours).

Before training, it is recommended to first try the following code to verify everything is okay and memory is sufficient:

python -m basic.cli --mode train --noload --debug

Then to fully train, run:

python -m basic.cli --mode train --noload

You can speed up the training process with optimization flags:

python -m basic.cli --mode train --noload --len_opt --cluster

You can still omit them, but training will be much slower.

Note that during the training, the EM and F1 scores from the occasional evaluation are not the same with the score from official squad evaluation script. The printed scores are not official (our scoring scheme is a bit harsher). To obtain the official number, use the official evaluator (copied in squad folder, squad/evaluate-v1.1.py). For more information See 3.Test.

3. Test

To test, run:

python -m basic.cli

Similarly to training, you can give the optimization flags to speed up test (5 minutes on dev data):

python -m basic.cli --len_opt --cluster

This command loads the most recently saved model during training and begins testing on the test data. After the process ends, it prints F1 and EM scores, and also outputs a json file ($PWD/out/basic/00/answer/test-####.json, where #### is the step # that the model was saved). Note that the printed scores are not official (our scoring scheme is a bit harsher). To obtain the official number, use the official evaluator (copied in squad folder) and the output json file:

python squad/evaluate-v1.1.py $HOME/data/squad/dev-v1.1.json out/basic/00/answer/test-####.json

3.1 Loading from pre-trained weights

Instead of training the model yourself, you can choose to use pre-trained weights that were used for SQuAD Leaderboard submission. Refer to this worksheet in CodaLab to reproduce the results. If you are unfamiliar with CodaLab, follow these simple steps (given that you met all prereqs above):

  1. Download save.zip from the worksheet and unzip it in the current directory.
  2. Copy glove.6B.100d.txt from your glove data folder ($HOME/data/glove/) to the current directory.
  3. To reproduce single model:
basic/run_single.sh $HOME/data/squad/dev-v1.1.json single.json

This writes the answers to single.json in the current directory. You can then use the official evaluator to obtain EM and F1 scores. If you want to run on GPU (~5 mins), change the value of batch_size flag in the shell file to a higher number (60 for 12GB GPU RAM). 4. Similarly, to reproduce ensemble method:

basic/run_ensemble.sh $HOME/data/squad/dev-v1.1.json ensemble.json 

If you want to run on GPU, you should run the script sequentially by removing '&' in the forloop, or you will need to specify different GPUs for each run of the for loop.

Results

Dev Data

Note these scores are from the official evaluator (copied in squad folder, squad/evaluate-v1.1.py). For more information See 3.Test. The scores appeared during the training could be lower than the scores from the official evaluator.

EM (%) F1 (%)
single 67.7 77.3
ensemble 72.6 80.7

Test Data

EM (%) F1 (%)
single 68.0 77.3
ensemble 73.3 81.1

Refer to our paper for more details. See SQuAD Leaderboard to compare with other models.

Multi-GPU Training & Testing

Our model supports multi-GPU training. We follow the parallelization paradigm described in TensorFlow Tutorial. In short, if you want to use batch size of 60 (default) but if you have 3 GPUs with 4GB of RAM, then you initialize each GPU with batch size of 20, and combine the gradients on CPU. This can be easily done by running:

python -m basic.cli --mode train --noload --num_gpus 3 --batch_size 20

Similarly, you can speed up your testing by:

python -m basic.cli --num_gpus 3 --batch_size 20 

Demo

For now, please refer to the demo branch of this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].