All Projects → mega002 → DocQN

mega002 / DocQN

Licence: other
Author implementation of "Learning to Search in Long Documents Using Document Structure" (Mor Geva and Jonathan Berant, 2018)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to DocQN

GAR
Code and resources for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021
Stars: ✭ 38 (+80.95%)
Mutual labels:  question-answering
mrqa
Code for EMNLP-IJCNLP 2019 MRQA Workshop Paper: "Domain-agnostic Question-Answering with Adversarial Training"
Stars: ✭ 35 (+66.67%)
Mutual labels:  question-answering
squadgym
Environment that can be used to evaluate reasoning capabilities of artificial agents
Stars: ✭ 27 (+28.57%)
Mutual labels:  question-answering
denspi
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Stars: ✭ 188 (+795.24%)
Mutual labels:  question-answering
PororoQA
PororoQA, https://arxiv.org/abs/1707.00836
Stars: ✭ 26 (+23.81%)
Mutual labels:  question-answering
unanswerable qa
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".
Stars: ✭ 21 (+0%)
Mutual labels:  question-answering
patrick-wechat
⭐️🐟 questionnaire wechat app built with taro, taro-ui and heart. 微信问卷小程序
Stars: ✭ 74 (+252.38%)
Mutual labels:  question-answering
cherche
📑 Neural Search
Stars: ✭ 196 (+833.33%)
Mutual labels:  question-answering
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (+47.62%)
Mutual labels:  question-answering
WikiTableQuestions
A dataset of complex questions on semi-structured Wikipedia tables
Stars: ✭ 81 (+285.71%)
Mutual labels:  question-answering
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (+442.86%)
Mutual labels:  question-answering
deformer
[ACL 2020] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Stars: ✭ 111 (+428.57%)
Mutual labels:  question-answering
head-qa
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Stars: ✭ 20 (-4.76%)
Mutual labels:  question-answering
QA4IE
Original implementation of QA4IE
Stars: ✭ 24 (+14.29%)
Mutual labels:  question-answering
CompareModels TRECQA
Compare six baseline deep learning models on TrecQA
Stars: ✭ 61 (+190.48%)
Mutual labels:  question-answering
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (+147.62%)
Mutual labels:  question-answering
KrantikariQA
An InformationGain based Question Answering over knowledge Graph system.
Stars: ✭ 54 (+157.14%)
Mutual labels:  question-answering
MSMARCO
Machine Comprehension Train on MSMARCO with S-NET Extraction Modification
Stars: ✭ 31 (+47.62%)
Mutual labels:  question-answering
Stargraph
StarGraph (aka *graph) is a graph database to query large Knowledge Graphs. Playing with Knowledge Graphs can be useful if you are developing AI applications or doing data analysis over complex domains.
Stars: ✭ 24 (+14.29%)
Mutual labels:  question-answering
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (+4.76%)
Mutual labels:  question-answering

Learning to Search in Long Documents Using Document Structure

This repo contains the code used in our paper https://arxiv.org/abs/1806.03529. The code includes a framework for training and evaluation of the DocQN and DQN models on TriviaQA-NoP, our version of TriviaQA where documents represented as tree objects. The data is available for download here.

There are two code versions included, of the two baselines in the paper - full and coupled. The full models leverage RaSoR predictions during navigation, while the coupled models do not. All files ending with _c.py belong to the coupled version.

Setup

The code requires python >= 3.5, tensorflow 1.3, and several other supporting libraries. Tensorflow should be installed separately following the docs. To install the other dependencies use:

$ pip install -r requirements.txt

Once the environment is set, you can download and extract the data by running the setup script:

$ python setup.py

Loading the data into memory requires at least 34GB RAM, where additional amount that depends on the replay memory size is required for training. To allow memory-efficient execution, which supports multiple executions in parallel, we run an RPC server that holds a single copy of the data in memory. Running the RPC server is a requirement for the full models, and an option for the coupled models. To use it, RabbitMQ must be installed.

The code can run both on GPU and CPU devices.

Data

TriviaQA-NoP comprises of dataset files and preprocessed files that are needed for code execution. By running the setup script, as described above, all files will be downloaded and extracted into the data folder.

TriviaQA-NoP dataset

The raw data is compressed in the triviaqa-nop.gz file, which comprises of raw evidence files without the preface section and their corresponding tree objects. In addition, the train, dev and test sets of TriviaQA (json files) updated to the evidence files in TriviaQA-NoP.

Preprocessed files

These include vocabulary and word embeddings based on GloVe, per-paragraph RaSoR predictions, and "evidence dictionary" of question-evidence pairs that holds the data (tree objects, tokenized evidence files, etc.) to be loaded into memory during training and evaluation. The .exp.pkl files under data/qa are an expanded version of the datasets (json files), where each sample of question and multiple evidences is broken into multiple question-evidence pairs.

Running the RPC server

This step is a requirement for training and evaluation of DocQN/DQN models that use RaSoR predictions during navigation (e.g. the full models). For the coupled models, it is optional. To run the RPC server, execute the following command:

$ python run_rpc_server.py

It will start the server, which will keep running until being shut down (with Ctrl+C).

Training

Use run_model[_c].py for training as follows:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --train

Where [seed] is an integer that python's hash seed will be fixed to. We set up the PYTHONHASHSEED environment variable in this way, due to a usage of the python hash function in the code. Fixing PYTHONHASHSEED guarantees a consistent hash function across different executions and machines.

In order to use the RPC server in the coupled version, add the flag --use_rpc. There are plenty of configuration options that can be listed with the --help menu. One important argument is --train_protocol which controls the tree sampling method during training. Specifically, for training of DocQN, run:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --train --train_protocol combined_ans_radius

and for training of DQN, run:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --train --train_protocol sequential

During training, metrics of the navigation performance will be output, including navigation accuracy ('avg_acc'). Model checkpoints and logs will be stored under the models and logs folders, accordingly, where a unique id is generated for every model.

It is possible to resume training by using the --resume argument, together with --model_id and --model_step. Notice that the reply memory will be re-initialized in this case.

Evaluation

Use run_model[_c].py for evaluation as follows:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --evaluate --model_id [id] --model_best

For evaluation of a specific checkpoint, use --model_step [step] instead of --model_best. This will evaluate the model on the development set of TriviaQA-NoP, and output two files:

  • logs/[model_id]_[model_step]_dev_output.json - contains the selected paragraph for every question-evidence pair in a SQuAD format, that can be given as input to RaSoR (or any other reading comprehension model).
  • logs/[model_id]_[model_step]_dev_dbg.log - a full navigation log, containing a description of the steps performed by the model for every question-evidence pair

To obtain predictions for the test set, run:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --test --model_id [id] --model_best

or:

$ PYTHONHASHSEED=[seed] python run_model[_c].py --test --model_id [id] --model_step [step]

Final answer predictions per question were obtained by running a version of this implementation of RaSoR on the model's output, and aggregating the predictions of multiple question evidences. Currently, we are not publishing this version of RaSoR.

Please feel welcome to contact us for further details and resources.

Pre-Trained Models

We release four pre-trained models:

  • DocQN - 1524410969.1593015
  • DQN - 1524411144.512193
  • DocQN coupled - 1517547376.1149364
  • DQN coupled - 1518010594.2258544

The models can be downloaded from this link, and should be extracted to the models folder in the root directory. Training and evaluation of these models were initiated with PYTHONHASHSEED=1618033988.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].