All Projects → eladsegal → strategyqa

eladsegal / strategyqa

Licence: MIT license
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".

Programming Languages

python
139335 projects - #7 most used programming language
Jsonnet
166 projects

Projects that are alternatives of or similar to strategyqa

XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
Stars: ✭ 61 (+125.93%)
Mutual labels:  question-answering, open-domain-qa
GAR
Code and resources for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021
Stars: ✭ 38 (+40.74%)
Mutual labels:  question-answering, open-domain-qa
CONVEX
As far as we know, CONVEX is the first unsupervised method for conversational question answering over knowledge graphs. A demo and our benchmark (and more) can be found at
Stars: ✭ 24 (-11.11%)
Mutual labels:  question-answering
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+114.81%)
Mutual labels:  question-answering
FlowQA
Implementation of conversational QA model: FlowQA (with slight improvement)
Stars: ✭ 197 (+629.63%)
Mutual labels:  question-answering
ProQA
Progressively Pretrained Dense Corpus Index for Open-Domain QA and Information Retrieval
Stars: ✭ 44 (+62.96%)
Mutual labels:  question-answering
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (+37.04%)
Mutual labels:  question-answering
lets-quiz
A quiz website for organizing online quizzes and tests. It's build using Python/Django and Bootstrap4 frameworks. 🤖
Stars: ✭ 165 (+511.11%)
Mutual labels:  question-answering
Instahelp
Instahelp is a Q&A portal website similar to Quora
Stars: ✭ 21 (-22.22%)
Mutual labels:  question-answering
golang-interview-questions
golang 面试集锦
Stars: ✭ 42 (+55.56%)
Mutual labels:  question-answering
GrailQA
No description or website provided.
Stars: ✭ 72 (+166.67%)
Mutual labels:  question-answering
NS-CQA
NS-CQA: the model of the JWS paper 'Less is More: Data-Efficient Complex Question Answering over Knowledge Bases.' This work has been accepted by JWS 2020.
Stars: ✭ 19 (-29.63%)
Mutual labels:  question-answering
semanticilp
Question Answering as Global Reasoning over Semantic Abstractions (AAAI-18)
Stars: ✭ 33 (+22.22%)
Mutual labels:  question-answering
django-simple-forum
full featured forum, easy to integrate and use.
Stars: ✭ 65 (+140.74%)
Mutual labels:  question-answering
COVID19-IRQA
No description or website provided.
Stars: ✭ 32 (+18.52%)
Mutual labels:  question-answering
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (+129.63%)
Mutual labels:  question-answering
Question-Answering-System
A factoid based question answering system | Python | Flask | NLP
Stars: ✭ 0 (-100%)
Mutual labels:  question-answering
exams-qa
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Stars: ✭ 25 (-7.41%)
Mutual labels:  question-answering
HAR
Code for WWW2019 paper "A Hierarchical Attention Retrieval Model for Healthcare Question Answering"
Stars: ✭ 22 (-18.52%)
Mutual labels:  question-answering
Dynamic-Coattention-Network-for-SQuAD
Tensorflow implementation of DCN for question answering on the Stanford Question Answering Dataset (SQuAD)
Stars: ✭ 14 (-48.15%)
Mutual labels:  question-answering

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

This repository contains the official code of the paper: "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies", accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2021.

Citation

@article{geva2021strategyqa,
  title = {{Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies}},
  author = {Geva, Mor and Khashabi, Daniel and Segal, Elad and Khot, Tushar and Roth, Dan and Berant, Jonathan},
  journal = {Transactions of the Association for Computational Linguistics (TACL)},
  year = {2021},
}

Following are instructions to reproduce the experiments reported in the paper, on the StrategyQA dataset.


Quick Links

  1. Setup
  2. Training
  3. Prediction and Evaluation
  4. Download Links to Our Trained Models

Setup

Requirements

Our experiments were conducted in a Python 3.7 environment. To clone the repository and set up the environment, please run the following commands:

git clone https://github.com/eladsegal/strategyqa.git
cd strategyqa
pip install -r requirements.txt

StrategyQA dataset files

The official StrategyQA dataset files with a detailed description of their format can be found on the dataset page.
To train our baseline models, we created a 90%/10% random split of the official train set to get an unofficial train/dev split: data/strategyqa/[train/dev].json.

(Optional) Creating an Elasticsearch index of our corpus

Download link to our full corpus of Wikipedia paragraphs is available on the dataset page. A script for indexing the paragraphs into Elasticsearch is available here.


Training

  • In scripts with GPU, replace it with a GPU, a list of GPUs or -1 for CPU.

  • Download links to our trained models are provided in Links to our Trained Models.

RoBERTa*

RoBERTa* is a RoBERTa model fine-tuned on auxiliary datasets that we used as our base model when fine-tuning on StrategyQA. We trained RoBERTa* as follows:

  1. Download twentyquestions dataset and extract it to data/, so you have data/twentyquestions/twentyquestions-[train/dev].jsonl.

  2. Download BoolQ dataset and extract it to data/, so you have data/boolq/[train/dev].jsonl.

  3.  python run_scripts/train_RoBERTa_STAR.py -s OUTPUT_DIR -g "GPU"
    

    A trained RoBERTa* model can be found here.

Question Answering Models

The directory configs/strategy_qa containes configuration files for the question answering models described in the paper. To train a question answering model of a specific configuration, run the train.py script as follows:

python run_scripts/train.py --config-file configs/strategy_qa/CONFIG_NAME.jsonnet -s OUTPUT_DIR -g "GPU" -w [path to a RoBERTa* model (.tar.gz file)]

A trained model for each configuration can be found in https://storage.googleapis.com/ai2i/strategyqa/models/CONFIG_NAME.tar.gz,
and evaluation scores for it on the used dev set (Setup) can be found in https://storage.googleapis.com/ai2i/strategyqa/models/CONFIG_NAME.json.

Figures depicting the resource dependency of the training procedures can be found here.

Notes:

  • Configurations with "base" in their name are not runnable on their own.

  • Models that query the Elasticsearch server won't be able to get results for queries that aren't already in data/queries_cache.json, unless an Elasticsearch server is set up and referred to in src/data/dataset_readers/utils/elasticsearch_utils.py. See more details on setting up an Elasticsearch index in Setup.

  • The config 4_STAR_IR-D.jsonnet is not trainable, but used only for evaluation of 5_STAR_IR-ORA-D.jsonnet with decompositions generated with BART-Decomp.
    It requires data/strategyqa/generated/bart_decomp_dev_predictions.jsonl, see Question Decomposition Model - BART-Decomp to learn how to generate it. A dependency graph can be found here.
    To create an AllenNLP model archive for it, run the following:

    python tools/tar_to_tar.py [path to a 5_STAR_IR-ORA-D model (.tar.gz file)] configs/4_STAR_IR-D.jsonnet 4_STAR_IR-D.tar.gz
    
  • The config 8_STAR_ORA-P-D-last-step.jsonnet requires data/strategyqa/transformer_qa_ORA-P_[train/dev]_no_placeholders.json, see Iterative Answering of Decompositions to learn how to generate it. A dependency graph can be found here.

Question Decomposition Model (BART-Decomp)

  1. Train the model:

    python run_scripts/train.py --config-file configs/decomposition/bart_decomp_strategy_qa.jsonnet -s OUTPUT_DIR -g "GPU"
    

    A trained model can be found here.

  2. Output predictions:

    python run_scripts/predict.py --model [path to a BART-Decomp model (.tar.gz file)] --data data/strategyqa/dev.json -g "GPU" --output-file data/strategyqa/generated/bart_decomp_dev_predictions.jsonl
    

Iterative Answering of Decompositions

  1. Download BoolQ dataset and extract it to data/, so you have data/boolq/[train/dev].jsonl.

  2. Download SQuAD 2.0 dataset and extract it to data/, so you have data/squad_v2/[train/dev]-v2.0.json.

  3. Append BoolQ to SQuAD:

    python -m tools.squadify_boolq data/boolq/train.jsonl data/squad/squad_v2_boolq_dataset_train.json --append-to data/squad/train-v2.0.json
    
    python -m tools.squadify_boolq data/boolq/dev.jsonl data/squad/squad_v2_boolq_dataset_dev.json --append-to data/squad/dev-v2.0.json
    
  4. Train a RoBERTa Extractive QA model on SQuAD and BoolQ:

    python run_scripts/train.py --config-file configs/squad/transformer_qa_large.jsonnet -s OUTPUT_DIR -g "GPU"
    

    A trained model can be found here.

  5. Replace the placeholders in the gold decomposition:

    python -m src.models.iterative.run_model -g [GPU (single only)] --qa-model-path ../experiments/publish/transformer_qa_large.tar.gz --paragraphs-source ORA-P --data data/strategyqa/train.json --output-predictions-file data/strategyqa/generated/transformer_qa_ORA-P_train_no_placeholders.json
    
    python -m src.models.iterative.run_model -g [GPU (single only)] --qa-model-path ../experiments/publish/transformer_qa_large.tar.gz --paragraphs-source ORA-P --data data/strategyqa/dev.json --output-predictions-file data/strategyqa/generated/transformer_qa_ORA-P_dev_no_placeholders.json
    

    This script allows for different paragraphs sources to be used (IR-Q/ORA-P/IR-ORA-D/IR-D), and can also work on generated decompositions instead of the gold ones (use --generated-decompositions-paths).


Prediction and Evaluation

The StrategyQA leaderboard is available here.

The official evaluation script can be found here.

Question Answering

  • Evaluate accuracy:
    python run_scripts/evaluate.py --model [path to a QA model (.tar.gz file)] --data DATA_PATH -g "GPU"
    
  • Output predictions:
    python run_scripts/predict.py --model [path to a QA model (.tar.gz file)] --data DATA_PATH -g "GPU" --output-file OUTPUT_PATH.jsonl
    

Notes:

  • The model created with the config 8_STAR_ORA-P-D-last-step.jsonnet should be be run with data/strategyqa/transformer_qa_ORA-P_dev_no_placeholders.json for DATA_PATH, and not with data/strategyqa/dev.json like the other models. This is because the model depends on using the last decomposition step without placeholders.

Recall@10

  1. Outputs the retrieved paragraphs for the configuration.
    The format is a dictionary with "qid" as a key and a list of paragraph IDs as the value.

    python ir_evaluation/get_paragraphs_by_config.py --config-file configs/CONFIG_NAME.jsonnet --output-file OUTPUT_PATH --data DATA_PATH
    
  2. python ir_evaluation/[email protected] --data DATA_PATH --retrieved-paragraphs [OUTPUT_PATH from the previous step]
    

Download Links to Our Trained Models

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].