All Projects → cognitiveailab → tg2021task

cognitiveailab / tg2021task

Licence: MIT License
Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to tg2021task

vf3lib
VF3 Algorithm - The fastest algorithm to solve subgraph isomorphism on large and dense graphs
Stars: ✭ 58 (+222.22%)
Mutual labels:  graphs
query-focused-sum
Official code repository for "Exploring Neural Models for Query-Focused Summarization".
Stars: ✭ 17 (-5.56%)
Mutual labels:  question-answering
graffeine
Simple, modular graphs for iOS.
Stars: ✭ 22 (+22.22%)
Mutual labels:  graphs
dlaudio
Master thesis: Structured Auto-Encoder with application to Music Genre Recognition (code)
Stars: ✭ 14 (-22.22%)
Mutual labels:  graphs
Giveme5W
Extraction of the five journalistic W-questions (5W) from news articles
Stars: ✭ 16 (-11.11%)
Mutual labels:  question-answering
WordMat
WordMat is an add-in to MicroSoft Word enabling math functionality
Stars: ✭ 25 (+38.89%)
Mutual labels:  graphs
sr graph
A simple, one-file, header-only, C++ utility for graphs, curves and histograms.
Stars: ✭ 67 (+272.22%)
Mutual labels:  graphs
Graphs.jl
An optimized graphs package for the Julia programming language
Stars: ✭ 197 (+994.44%)
Mutual labels:  graphs
silky-charts
A silky smooth D3/React library
Stars: ✭ 38 (+111.11%)
Mutual labels:  graphs
WikiQA
Very Simple Question Answer System using Chinese Wikipedia Data
Stars: ✭ 24 (+33.33%)
Mutual labels:  question-answering
kaggle-quora-question-pairs
My solution to Kaggle Quora Question Pairs competition (Top 2%, Private LB log loss 0.13497).
Stars: ✭ 104 (+477.78%)
Mutual labels:  competition
TOEFL-QA
A question answering dataset for machine comprehension of spoken content
Stars: ✭ 61 (+238.89%)
Mutual labels:  question-answering
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+944.44%)
Mutual labels:  question-answering
DocQN
Author implementation of "Learning to Search in Long Documents Using Document Structure" (Mor Geva and Jonathan Berant, 2018)
Stars: ✭ 21 (+16.67%)
Mutual labels:  question-answering
DVQA dataset
DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: ✭ 20 (+11.11%)
Mutual labels:  question-answering
MSMARCO
Machine Comprehension Train on MSMARCO with S-NET Extraction Modification
Stars: ✭ 31 (+72.22%)
Mutual labels:  question-answering
XORQA
This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".
Stars: ✭ 61 (+238.89%)
Mutual labels:  question-answering
qa
TensorFlow Models for the Stanford Question Answering Dataset
Stars: ✭ 72 (+300%)
Mutual labels:  question-answering
Algorithms
Free hands-on course with the implementation (in Python) and description of several computational, mathematical and statistical algorithms.
Stars: ✭ 117 (+550%)
Mutual labels:  graphs
fermor
Fast, powerful, general-purpose graph traversal and modelling tools plus a performant immutable in-memory graph database.
Stars: ✭ 22 (+22.22%)
Mutual labels:  graphs

TextGraphs-15 Shared Task on Multi-Hop Inference Explanation Regeneration

We invite participation in the 3rd Shared Task on Explanation Regeneration associated with the 15th Workshop on Graph-Based Natural Language Processing (TextGraphs 2021).

All systems participating in the shared task will be invited to submit system description papers. Each system description paper will be peer-reviewed by (two) other participating teams and will be presented as a poster at the main workshop: https://sites.google.com/view/textgraphs2021.

Tests

Overview

Multi-hop inference is the task of combining more than one piece of information to solve an inference task, such as question answering. This can take many forms, from combining free-text sentences read from books or the web, to combining linked facts from a structured knowledge base. The Shared Task on Explanation Regeneration asks participants to develop methods that reconstruct large explanations for science questions, using a corpus of gold explanations that provides supervision and instrumentation for this multi-hop inference task. Each explanation is represented as an "explanation graph", a set of atomic facts (between 1 and 16 per explanation, drawn from a knowledge base of 9,000 facts) that, together, form a detailed explanation for the reasoning required to answer and explain the resoning behind a question.

Explanation Regeneration is a stepping-stone towards general multi-hop inference over language. In this shared task we frame explanation regeneration as a ranking task, where the inputs to a given system consist of questions and their correct answers. Participating systems must then rank atomic facts from a provided semi-structured knowledge base such that the combination of top-ranked facts provide a detailed explanation for the answer. This requires combining scientific and common-sense/world knowledge with compositional inference.

While large language models (BERT, ERNIE) achieved the highest performance in the 2019 and 2020 shared tasks, substantially advancing the state-of-the-art over previous methods, absolute performance remains modest, highlighting the difficulty of generating detailed explanations through multi-hop reasoning.

New for 2021

Many-hop multi-hop inference is challenging because there are often multiple ways of assembling a good explanation for a given question. This 2021 instantiation of the shared task focuses on the theme of determining relevance versus completeness in large multi-hop explanations. To this end, this year we include a very large dataset of approximately 250,000 expert-annotated relevancy ratings for facts ranked highly by baseline language models from previous years (e.g. BERT, RoBERTa).

Submissions using a variety of methods (graph-based or otherwise) are encouraged. Submissions that evaluate how well existing models designed on 2-hop multihop question answering datasets (e.g. HotPotQA, QASC, etc) perform at many-fact multi-hop explanation regeneration are welcome.

Example explanation graph

Important Dates

  • 2021-02-15: Training data release
  • 2021-03-10: Test data release; Evaluation start
  • 2021-03-24: Evaluation end
  • 2021-04-01: System description paper deadline
  • 2021-04-13: Deadline for reviews of system description papers
  • 2021-04-15: Author notifications
  • 2021-04-26: Camera-ready description paper deadline
  • 2021-06-11: TextGraphs-15 workshop

Dates are specified in the ISO 8601 format.

Data

The data used in this shared task contains approximately 5,100 science exam questions drawn from the AI2 Reasoning Challenge (ARC) dataset (Clark et al., 2018), together with multi-fact explanations for their correct answers drawn from the WorldTree V2.1 explanation corpus (Xie et al., 2020, Jansen et al., 2018). For 2021, this has been augmented with a new set of approximately 250k pre-release expert-generated relevancy ratings.

The knowledge base supporting these questions and their explanations contains approximately 9,000 facts. To encourage a variety of solving methods, the knowledge base is available both as plain-text sentences (unstructured) as well as semi-structured tables. Facts are a combination of scientific knowledge as well as common-sense/world knowledge.

The full dataset (WorldTree V2.1 Relevancy Ratings) can be downloaded at the following links:

More information about the WorldTree V2.1 corpus, including a book of explanation graphs, can be found here.

Baselines

The shared task data distribution includes a baseline that uses a term frequency model (tf.idf) to rank how likely table row sentences are to be a part of a given explanation. The performance of this baseline on the development partition is 0.513 NDCG.

Python

First, get the data using make dataset or the following sequence of commands:

$ wget cognitiveai.org/dist/tg2021-alldata-practice.zip
$ unzip tg2021-alldata-practice.zip

Then, run the baseline

$ ./baseline_tfidf.py data/tables data/wt-expert-ratings.dev.json > predict.txt

The format of the predict.txt file is questionID<TAB>explanationID without header; the order is important. When tqdm is installed, baseline_tfidf.py will show a nicely-looking progress bar.

To compute the NDCG for the model, run the following command:

$ ./evaluate.py --gold data/wt-expert-ratings.dev.json predict.txt

If you want to run the evaluation script without tqdm, adopt the following command:

$ ./evaluate.py --no-tqdm --gold data/wt-expert-ratings.dev.json predict.txt

In order to prepare a submission file for CodaLab, create a ZIP file containing your predict.txt.

Dataset sample for the Practice phase can be created with make predict-tfidf-dev.zip using the dev dataset, while the one for the Evaluation phase can be created with make predict-tfidf-test.zip using the test dataset.

Submission

Please submit your solutions via CodaLab: https://competitions.codalab.org/competitions/29228.

Contacts

This shared task is organized within the 15th workshop on graph-based natural language processing, TextGraphs-15: https://sites.google.com/view/textgraphs2021.

We welcome questions and answers on the shared task on CodaLab Forums: https://competitions.codalab.org/forums/25924/.

To contact the task organizers directly, please send an email to [email protected].

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the TextGraph-15 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.

You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.

You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.

You agree not to use or redistribute the shared task data except in the manner prescribed by its licence.

To encourage transparency and replicability, all teams must publish their code, tuning procedures, and instructions for running their models with their submission of shared task papers.

References

@inproceedings{Thayaparan:21,
  author    = {Thayaparan, Mokanarangan and Valentino, Marco and Jansen, Peter and Ustalov, Dmitry},
  title     = {{TextGraphs~2021 Shared Task on Multi-Hop Inference for Explanation Regeneration}},
  year      = {2021},
  booktitle = {Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15)},
  pages     = {156--165},
  address   = {Mexico City, Mexico},
  publisher = {Association for Computational Linguistics},
  doi       = {10.18653/v1/2021.textgraphs-1.17},
  isbn      = {978-1-954085-38-1},
  language  = {english},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].