All Projects → AkariAsai → XORQA

AkariAsai / XORQA

Licence: MIT License
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to XORQA

extractive rc by runtime mt
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Stars: ✭ 36 (-40.98%)
Mutual labels:  multilingual, question-answering
strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
Stars: ✭ 27 (-55.74%)
Mutual labels:  question-answering, open-domain-qa
exams-qa
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Stars: ✭ 25 (-59.02%)
Mutual labels:  multilingual, question-answering
GAR
Code and resources for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021
Stars: ✭ 38 (-37.7%)
Mutual labels:  question-answering, open-domain-qa
CompareModels TRECQA
Compare six baseline deep learning models on TrecQA
Stars: ✭ 61 (+0%)
Mutual labels:  question-answering
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (-49.18%)
Mutual labels:  question-answering
deformer
[ACL 2020] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Stars: ✭ 111 (+81.97%)
Mutual labels:  question-answering
finance-qa-spider
金融问答平台文本数据采集/爬取,数据源涉及上交所,深交所,全景网及新浪股吧
Stars: ✭ 33 (-45.9%)
Mutual labels:  question-answering
boxshop
Laravel ecommerce platform
Stars: ✭ 78 (+27.87%)
Mutual labels:  multilingual
DocQN
Author implementation of "Learning to Search in Long Documents Using Document Structure" (Mor Geva and Jonathan Berant, 2018)
Stars: ✭ 21 (-65.57%)
Mutual labels:  question-answering
WikiTableQuestions
A dataset of complex questions on semi-structured Wikipedia tables
Stars: ✭ 81 (+32.79%)
Mutual labels:  question-answering
mrqa
Code for EMNLP-IJCNLP 2019 MRQA Workshop Paper: "Domain-agnostic Question-Answering with Adversarial Training"
Stars: ✭ 35 (-42.62%)
Mutual labels:  question-answering
Stargraph
StarGraph (aka *graph) is a graph database to query large Knowledge Graphs. Playing with Knowledge Graphs can be useful if you are developing AI applications or doing data analysis over complex domains.
Stars: ✭ 24 (-60.66%)
Mutual labels:  question-answering
PororoQA
PororoQA, https://arxiv.org/abs/1707.00836
Stars: ✭ 26 (-57.38%)
Mutual labels:  question-answering
TOEFL-QA
A question answering dataset for machine comprehension of spoken content
Stars: ✭ 61 (+0%)
Mutual labels:  question-answering
Zoombraco
⚡ A lean boilerplate for rapidly developing strong-typed Umbraco websites.
Stars: ✭ 37 (-39.34%)
Mutual labels:  multilingual
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-63.93%)
Mutual labels:  question-answering
MSMARCO
Machine Comprehension Train on MSMARCO with S-NET Extraction Modification
Stars: ✭ 31 (-49.18%)
Mutual labels:  question-answering
head-qa
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Stars: ✭ 20 (-67.21%)
Mutual labels:  question-answering
unanswerable qa
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".
Stars: ✭ 21 (-65.57%)
Mutual labels:  question-answering

XOR QA: Cross-lingual Open-Retrieve Question Answering

Tasks | Download | Baselines | Evaluation | Website and Leader board | Paper | Updates

Introduction

XOR-TyDi QA brings together for the first time information-seeking questions, open-retrieval QA, and multilingual QA to create a multilingual open-retrieval QA dataset that enables cross-lingual answer retrieval. It consists of questions written by information-seeking native speakers in 7 typologically diverse languages and answer annotations that are retrieved from multilingual document collections.

The Tasks

There are three sub-tasks: XOR-Retrieve, XOR-EnglishSpan, and XOR-Full.

task_overview

XOR-Retrieve

XOR-Retrieve is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to retrieve English document that answers the question.

XOR-EnglishSpan

XOR-English Span is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to output a short answer in English.

XOR-Full

XOR-Full is a cross-lingual retrieval task where a question is written in the target language (e.g., Japanese) and a system is required to output a short answer in the target language.

Download the Dataset

You can download the data at the following URLs.

The datasets below include question and short answer information only. If you need the long answer information for supervised training of retrievers or reader, please download the GoldParagraph data.

We also ask you to use Wikipedia 2019-0201 dump, which can be downloaded the link from TyDiQA's source data list for the 7 languages + English.

Note (April 12, 2021): Please note that we modified the XOR-TyDi QA data, and released a new version as XOR-TyDi (v1.1). All of the data you can download are from here is v1.1 and the leaderboard results are based on v1.1.

Data for XOR tasks

For XOR-Retrieve and XOR-English Span:

For XOR-Full:

Additional resources

Gold Paragraph Data

  • Gold Paragraph data (similar to TyDi GP): The gold paragraph data includes annotated passage answers (gold paragraph) and short answers in the same format as SQuAD. As in TyDi QA Gold Passage Task, you can directly recycle your SQuAD QA codes.

  • Reading Comprehension data with no answer and yes/no: With a slightly different from the original SQuAD format, the data includes all "long answer only" & "yes-no" questions, as in NQ and TyDi QA full task.

Question translation data

We also make the human annotated 30k question translation data publicly available. As the translation data is only used for annotation and we do not expect use this oracle translation, we release the translation for train data only.

The translation data for each language pair (English-{Arabic, Bengali, Finnish, Japanese, Korean, Russian, Telugu}) is represented as a pair of text file, in which each line include one sentence corresponding to the translated English question, following common MT corpora.

The list of the links to parallel corpora (L_i-to-English) is below:

Building a baseline system

Our baseline includes: Dense Passage Retriever (Karpukhin et al., 2020), Path Retriever (Asai et al., 2020), BM25 (implementations are based on ElasticSearch)+multilingual QA models.

Please see baselines/README.md for more information.

Evaluation

To evaluate your modes' predictions on development data, please run the commands below. Please see the details of the prediction file format and make sure your prediction results follow the format.

You also needs to install MeCab and NLTK before running evaluation -- they are used to tokenization for XOR-Retrieve and for evaluations on Japanese answers.

pip install mecab-python3
pip install unidic-lite
pip install nltk
  • XOR Retrieve (metrics: R@2kt, R@5kt)
python3 evals/eval_xor_retrieve.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>
  • XOR-Full (metrics: F1, EM)
python3 evals/eval_xor_engspan.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>
  • XOR-English Span (metrics: F1, EM, BLEU)
python3 evals/eval_xor_full.py \
    --data_file <path_to_input_data> \
    --pred_file <path_to_predictions>

Prediction file format

To evaluate your model's predictions, you need to format the predicted results in specific formats.

XOR-Retrieve

Note: Our evaluation script evaluate if the correct answers are included in the first 2,000 tokens and 5,000 tokens for R@2kt and R@5kt, respectively. Please make sure the total token numbers of your retrieved document would be larger than those tokens; otherwise your scores might be underestimated. See the detailed definition of those metrics in our paper.

The XOR-Retrieve file should be output as follows:

["id": 12345, "lang": "ja, ctxs": ["Tokyo (東京) is the capital and most populous prefecture of Japan.", "Located at the head of Tokyo Bay, the prefecture forms part of the Kantō region on the central Pacific coast of Japan's main island, Honshu. " ... ]
]

XOR-Full, XOR-English Span

For those two tasks, your prediction file should be a json file of a dictionary, whose keys are question ids and values are the predicted short answers.

e.g.,

{"12345": "東京", "67890": "Москва", ...}

Submission guide

If you want to submit to our leaderboard, please create the prediction files on our test data for your target task, and email it to Akari Asai (akari[at]cs.washington.edu).

Please make sure you include the following information in the email.

  • test prediction file in our prediction file format
  • the task name
  • the name of the model
  • whether you use external black box API (e.g., Google Translate / Google Custom Search / Bing Search) which cannot be reproduced on your side. To coordinate open research, we encourage you to use reproducible components, and the system with those external (unreproducible) components will be populated below the entries w/o APIs.
  • [optional] the institution. You can update those information later.
  • [optional] link to the paper and code. You can update those information later.

Notes

  • The models will be sorted by F1 score for XOR-English Span and XOR-Full, and by R@5kt for XOR-Retrieve.
  • Please perform any model selection or ablation necessary to improve model performance on the dev set only. We limit the number of submissions to be 20 per year and 3 per month.
  • If you plan to have your model officially evaluated, please plan 1 weeks in advance to allow sufficient time for your model results to be on the leaderboard.

Updates

  • [October 23, 2020]: We released the initial version of TyDi-XOR dataset and preprint paper.
  • [January 18, 2021]: We released codes and trained models. Please see the details at baselines.
  • [April 13, 2021]: Our papaer's camera-ready version is now available at Arxiv. We also did some minor changes to XOR-TyDi's evaluation data and released new version of TyDi-XOR as TYDi-XOR v1.1. The changes are: (1) we filtered out a few yes/no answer annotations (e.g., yes/no answers to factoid questions) that are potentially incorrect, and (2) we added some answer translations that are not appropriately included in previous XOR QA's full evaluations due to postprocessing issues.

Citation and Contact

If you find this codebase is useful or use the data in your work, please cite our paper.

@inproceedings{xorqa,
    title   = {{XOR} {QA}: Cross-lingual Open-Retrieval Question Answering},
    author  = {Akari Asai and Jungo Kasai and Jonathan H. Clark and Kenton Lee and Eunsol Choi and Hannaneh Hajishirzi},
    booktitle={NAACL-HLT},
    year    = {2021}
}

If you use TyDi-XOR QA data, please also make sure to cite the original TyDi QA paper, which we built TyDI-XOR off of:

@article{tydiqa,
title   = {TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages},
author  = {Jonathan H. Clark and Eunsol Choi and Michael Collins and Dan Garrette and Tom Kwiatkowski and Vitaly Nikolaev and Jennimaria Palomaki}
journal = {TACL},
year    = {2020}
}

Please contact Akari Asai (@AkariAsai, akari[at]cs.washington.edu) for questions and suggestions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].