All Projects → morningmoni → GAR

morningmoni / GAR

Licence: other
Code and resources for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to GAR

strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
Stars: ✭ 27 (-28.95%)
Mutual labels:  question-answering, open-domain-qa
XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
Stars: ✭ 61 (+60.53%)
Mutual labels:  question-answering, open-domain-qa
BERT-for-Chinese-Question-Answering
No description or website provided.
Stars: ✭ 75 (+97.37%)
Mutual labels:  question-answering
ODSQA
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
Stars: ✭ 43 (+13.16%)
Mutual labels:  question-answering
keras-chatbot-web-api
Simple keras chat bot using seq2seq model with Flask serving web
Stars: ✭ 51 (+34.21%)
Mutual labels:  seq2seq-model
calcipher
Calculates the best possible answer for multiple-choice questions using techniques to maximize accuracy without any other outside resources or knowledge.
Stars: ✭ 15 (-60.53%)
Mutual labels:  question-answering
extractive rc by runtime mt
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Stars: ✭ 36 (-5.26%)
Mutual labels:  question-answering
SoTu
Bag of Visual Feature with Hamming Enbedding, Reranking
Stars: ✭ 48 (+26.32%)
Mutual labels:  reranking
patrick-wechat
⭐️🐟 questionnaire wechat app built with taro, taro-ui and heart. 微信问卷小程序
Stars: ✭ 74 (+94.74%)
Mutual labels:  question-answering
explicit memory tracker
[ACL 2020] Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading
Stars: ✭ 35 (-7.89%)
Mutual labels:  question-answering
NCE-CNN-Torch
Noise-Contrastive Estimation for Question Answering with Convolutional Neural Networks (Rao et al. CIKM 2016)
Stars: ✭ 54 (+42.11%)
Mutual labels:  question-answering
verseagility
Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.
Stars: ✭ 23 (-39.47%)
Mutual labels:  question-answering
Shukongdashi
使用知识图谱,自然语言处理,卷积神经网络等技术,基于python语言,设计了一个数控领域故障诊断专家系统
Stars: ✭ 109 (+186.84%)
Mutual labels:  question-answering
QA HRDE LTC
TensorFlow implementation of "Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering," NAACL-18
Stars: ✭ 29 (-23.68%)
Mutual labels:  question-answering
FreebaseQA
The release of the FreebaseQA data set (NAACL 2019).
Stars: ✭ 55 (+44.74%)
Mutual labels:  question-answering
question-answering
No description or website provided.
Stars: ✭ 32 (-15.79%)
Mutual labels:  question-answering
TransTQA
Author: Wenhao Yu ([email protected]). EMNLP'20. Transfer Learning for Technical Question Answering.
Stars: ✭ 12 (-68.42%)
Mutual labels:  question-answering
QANet
A TensorFlow implementation of "QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension"
Stars: ✭ 31 (-18.42%)
Mutual labels:  question-answering
gated-attention-reader
Tensorflow/Pytorch implementation of Gated Attention Reader
Stars: ✭ 37 (-2.63%)
Mutual labels:  question-answering
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (+36.84%)
Mutual labels:  question-answering

This repo provides the code and resources of the following papers:

GAR

GAR

"Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021

TLDR: GAR augments a question with relevant contexts generated by seq2seq learning, with the question as input and target outputs such as the answer, the sentence where the answer belongs to, and the title of a passage that contains the answer. With the generated contexts appended to the original questions, GAR achieves state-of-the-art OpenQA performance with a simple BM25 retriever.

RIDER

RIDER

"Reader-Guided Passage Reranking for Open-Domain Question Answering", Findings of ACL 2021.

TLDR: RIDER is a simple and effective passage reranker, which reranks retrieved passages by reader predictions without any training. RIDER achieves 10~20 gains in top-1 retrieval accuracy, 1~4 gains in Exact Match (EM), and even outperforms supervised transformer-based rerankers.

Data & Model Output

[update 2022-02] The retrieval results of GAR on TriviaQA have been uploaded.

[update 2022-01] The data for GAR training/testing on TriviaQA is uploaded.

[update 2021-08] We provide the data for GAR training/testing on NaturalQuestions (NQ).

We now provide the generation-augmented queries (in case you wonder what they look like and/or perform retrieval yourself) and the retrieval results of GAR and GAR+ (same format as DPR, easily replacable) on the test set of NaturalQuestions. You may achieve performance improvements by simply replacing the retrieval results of DPR to GAR/GAR+ during inference. If not, you may need to re-train the reader using GAR/GAR+ results as well. More outputs will be released in the future.

For seq2seq learning, use {train/val/test}.source as the input and {train/val/test}.target as the output, where each line is one example. We provide the data files in such formats for GAR training/testing above.

In the same folder, save the list of ground-truth answers with name {val/test}.target.json if you want to evaluate EM during training (of the generative reader).

Please refer to DPR for the original dataset downloading.

Code

Generation

The codebase of seq2seq models is based on huggingface/transformers (version==2.11.0) examples.

See train_gen.yml for the package requirements and example commands to run the models.

train_generator.py: training of seq2seq models.

conf.py: configurations for train_generator.py. There are some default parameters but it might be easier to set e.g., --data_dir and --output_dir directly.

test_generator.py: test of seq2seq models (if not already done in train_generator.py).

Retrieval

We use pyserini for BM25 retrieval. Please refer to its document for indexing and searching wiki passages (wiki passages can be downloaded here). Alternatively, you may take a look at its effort to reproduce DPR results, which gives more detailed instructions and incorporates the passage-level span voting in GAR.

Reranking

Please see the instructions in rider/rider.py.

Reading

We experiment with one extractive reader and one generative reader.

For the extractive reader, we take the one used by dense passage retrieval. Please refer to DPR for more details.

For the generative reader, we reuse the codebase in the generation stage above, with [question; top-retrieved passages] as the source input and one ground-truth answer as the target output. Example script is provided in train_gen.yml.

Citation

Please cite our papers if you find them useful.

@inproceedings{mao-etal-2021-generation,
    title = "Generation-Augmented Retrieval for Open-Domain Question Answering",
    author = "Mao, Yuning  and
      He, Pengcheng  and
      Liu, Xiaodong  and
      Shen, Yelong  and
      Gao, Jianfeng  and
      Han, Jiawei  and
      Chen, Weizhu",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.316",
    doi = "10.18653/v1/2021.acl-long.316",
    pages = "4089--4100",
}


@inproceedings{mao-etal-2021-reader,
    title = "Reader-Guided Passage Reranking for Open-Domain Question Answering",
    author = "Mao, Yuning  and
      He, Pengcheng  and
      Liu, Xiaodong  and
      Shen, Yelong  and
      Gao, Jianfeng  and
      Han, Jiawei  and
      Chen, Weizhu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.29",
    doi = "10.18653/v1/2021.findings-acl.29",
    pages = "344--350",
}


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].