All Projects → aleSuglia → squadgym

aleSuglia / squadgym

Licence: Apache-2.0 license
Environment that can be used to evaluate reasoning capabilities of artificial agents

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to squadgym

patrick-wechat
⭐️🐟 questionnaire wechat app built with taro, taro-ui and heart. 微信问卷小程序
Stars: ✭ 74 (+174.07%)
Mutual labels:  question-answering
deformer
[ACL 2020] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Stars: ✭ 111 (+311.11%)
Mutual labels:  question-answering
KrantikariQA
An InformationGain based Question Answering over knowledge Graph system.
Stars: ✭ 54 (+100%)
Mutual labels:  question-answering
GAR
Code and resources for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021
Stars: ✭ 38 (+40.74%)
Mutual labels:  question-answering
finance-qa-spider
金融问答平台文本数据采集/爬取,数据源涉及上交所,深交所,全景网及新浪股吧
Stars: ✭ 33 (+22.22%)
Mutual labels:  question-answering
PororoQA
PororoQA, https://arxiv.org/abs/1707.00836
Stars: ✭ 26 (-3.7%)
Mutual labels:  question-answering
question-answering
No description or website provided.
Stars: ✭ 32 (+18.52%)
Mutual labels:  question-answering
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-18.52%)
Mutual labels:  question-answering
safe-grid-agents
Training (hopefully) safe agents in gridworlds
Stars: ✭ 25 (-7.41%)
Mutual labels:  gym
mrqa
Code for EMNLP-IJCNLP 2019 MRQA Workshop Paper: "Domain-agnostic Question-Answering with Adversarial Training"
Stars: ✭ 35 (+29.63%)
Mutual labels:  question-answering
QA4IE
Original implementation of QA4IE
Stars: ✭ 24 (-11.11%)
Mutual labels:  question-answering
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (+322.22%)
Mutual labels:  question-answering
rSoccer
🎳 Environments for Reinforcement Learning
Stars: ✭ 26 (-3.7%)
Mutual labels:  gym
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (+92.59%)
Mutual labels:  question-answering
unanswerable qa
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".
Stars: ✭ 21 (-22.22%)
Mutual labels:  question-answering
MLH-Quizzet
This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.
Stars: ✭ 23 (-14.81%)
Mutual labels:  question-answering
gym-management
Gym Management System provides an easy to use interface for the users and a database for the admin to maintain the records of gym members.
Stars: ✭ 27 (+0%)
Mutual labels:  gym
WikiTableQuestions
A dataset of complex questions on semi-structured Wikipedia tables
Stars: ✭ 81 (+200%)
Mutual labels:  question-answering
head-qa
HEAD-QA: A Healthcare Dataset for Complex Reasoning
Stars: ✭ 20 (-25.93%)
Mutual labels:  question-answering
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (+14.81%)
Mutual labels:  question-answering

SQuAD-Gym

Introduction

Recently, the Question Answering dataset Stanford Question Answering Dataset (SQuAD) has gained a lot of attention from practitioners and researchers due its appealing properties for evaluating the capabilities of agents able to answer open domain questions. In this dataset, given a reference context and a question, the agent should be able to generate an answer which may be composed by multiple tokens which are present in the given context. Due to its high quality, it represents a relevant benchmark for intelligent agents able to grasp, from a given context, relevant evidences that are required to generate the answer.

The SQuAD dataset contains questions extracted from Wikipedia and related to specific entities. If the agent is able to extract from the related context text the sequence of tokens which compose the answer we may legitimately state that the system demonstrate sound reasoning capabilities. Obviously, the system should be able to generate an answer without exploiting supplementary features associated to the question or to the context but it should be able to "read" from the context text the correct answer.

Project

The SQuAD-Gym represents a language game in which the agent receives multiple context-question pairs taken from the SQuAD dataset and for each of them, it should generate an answer composed by multiple tokens. According to the generated answer, the agent receives a question score which will be summed to the other scores obtained during the game. At the end of the game, it is generated a cumulative score which is the score that the agent should learn to maximize in the long run. It is worth to note that we do not require that the agent generates fixed responses but it should be able to generate a response by composing tokens together.

For instance, suppose that we are in a given match of the game and the agent receives the following context-question pair:

context

Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.

question

Which NFL team represented the NFC at Super Bowl 50?

We expect that the system by reading the text in the given context, should generate Carolina Panthers which is the correct answer to the given question according to the context text. After that, the system receives a score which is generated by computing the sentence-level BLEU score between the generated sequence and the possible target sequences present in the SQuAD dataset.

Setup

Before you can use the environment you need to download the SQuAD dataset in JSON format from the official website and run the build_env_data script in the following way:

python3 build_env_data.py squad_data.json env_data.pkl

In this way the script will generate a pickle file which will contain all the data required by the environment. After that, you can install the package using Python setuptools if you want to use it in your project or you can try it executing the env_test.py script specifying the pickle environment data generate by the build_env_data script:

python3 env_test.py env_data.pkl

Future work

This project represents a playground for artificial agents through which it will be possible to find an answer to the following question:

Is it possible to develop artificial agents able to answer open-domain questions which require different capabilities (e.g. ability to see, ability to hear, etc.)?

At the moment, the most important aspects that should be designed are reported in the following list:

  1. Extend the game in order to provide different scores to the agent through the game and not just the BLEU score between the generated answer and the target one;
  2. Design a possible curriculum learning strategy according to which the agent receives questions of increasing complexity while it plays the game;
  3. Design a multi-modal game: the agent should be able to answer question regarding textual data, images, songs or videos (like in the Italian game Rischiatutto).

The project is a work-in-progress. However, it can be really useful to share it with the community in order to obtain valuable feedback that can be leveraged so as to enhance and improve it. When the environment will be completed, it would be interesting to evaluate different kind of QA models, just like in the official SQuAD challenge.

Contributions

The project is in its early stages so all contributions are incredibly well accepted. Feel free to open an issue if you find something wrong or create a new pull request if you want to extend SQuAD-Gym.

Authors

Alessandro Suglia -- my name dot my surname at yahoo dot com

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].