All Projects → ankit-ai → SQUAD2.Q-Augmented-Dataset

ankit-ai / SQUAD2.Q-Augmented-Dataset

Licence: other
Augmented version of SQUAD 2.0 for Questions

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SQUAD2.Q-Augmented-Dataset

Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-29.03%)
Mutual labels:  question-answering, squad, bert
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+10896.77%)
Mutual labels:  question-answering, squad, bert
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (+125.81%)
Mutual labels:  question-answering, bert
cmrc2019
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
Stars: ✭ 118 (+280.65%)
Mutual labels:  question-answering, bert
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+638.71%)
Mutual labels:  question-answering, bert
Bi Att Flow
Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.
Stars: ✭ 1,472 (+4648.39%)
Mutual labels:  question-answering, squad
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+1545.16%)
Mutual labels:  translation, question-answering
Transformer-QG-on-SQuAD
Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)
Stars: ✭ 28 (-9.68%)
Mutual labels:  squad, bert
iamQA
中文wiki百科QA阅读理解问答系统,使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型,还有W2V词向量搜索,使用torchserve部署
Stars: ✭ 46 (+48.39%)
Mutual labels:  question-answering, bert
BERT-for-Chinese-Question-Answering
No description or website provided.
Stars: ✭ 75 (+141.94%)
Mutual labels:  question-answering, bert
Question-Answering-based-on-SQuAD
Question Answering System using BiDAF Model on SQuAD v2.0
Stars: ✭ 20 (-35.48%)
Mutual labels:  question-answering, squad
extractive rc by runtime mt
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Stars: ✭ 36 (+16.13%)
Mutual labels:  question-answering, squad
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+21370.97%)
Mutual labels:  question-answering, bert
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-6.45%)
Mutual labels:  question-answering, bert
Awesome Qa
😎 A curated list of the Question Answering (QA)
Stars: ✭ 596 (+1822.58%)
Mutual labels:  question-answering, squad
TriB-QA
吹逼我们是认真的
Stars: ✭ 45 (+45.16%)
Mutual labels:  question-answering, bert
question-answering
No description or website provided.
Stars: ✭ 32 (+3.23%)
Mutual labels:  question-answering, squad
cdQA-ui
⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.
Stars: ✭ 19 (-38.71%)
Mutual labels:  question-answering, bert
mcQA
🔮 Answering multiple choice questions with Language Models.
Stars: ✭ 23 (-25.81%)
Mutual labels:  question-answering, bert
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (+87.1%)
Mutual labels:  question-answering, bert

SQuAD 2.Q - Augmented-Dataset

Developers - Ankit Chadha ([email protected]) and Rewa Sood ([email protected])


This is a release of an Augmented dataset we produced on top of Stanford Question Answering Dataset (SQuAD) 2.0.

The repository is called SQuAD 2.Q since only the questions out of the SQuAD 2.0 dataset have been augmented using the process of Back Translation. The work can easily be extended to Context paragraphs using the python script (augment.py).



Why just Questions?

SQuAD 2.0 is a dataset where the context come from Wikipedia paragraphs and the questions are written by Cloud workers. When questions are written by cloud workers it inherently adds syntatic variance and grammar usage of human cloud workers. The idea here is to help the network generalize to the syntatic variance in the question to generalize better at:

  1. Understanding Questions
  2. Understanding interactions between Question and Context (Attention)


How does SQuAD 2.Q help?

We present our model called BertQA : Attention on Steroids where SQuAD 2.Q50 helped the same model achieve 2 point F1 improvement over SQuAD 2.0.

BertQA


Release notes:

  1. SQuAD 2Q (100% Augmented - for every question in the dataset there is an augmented question)
  2. SQuAD 2Q50 (50% Augmented)
  3. SQuAD 2Q35 (35% Augmented)
  4. augment.py (Python code to use Google Cloud API to augment the dataset)

Usage:

python augment.py

Back Translation:

Backtranslation

Read more: http://ankit-ai.blogspot.com/2019/03/future-of-natural-language-processing.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].