ankit-ai / SQUAD2.Q-Augmented-Dataset

Licence: other

Augmented version of SQUAD 2.0 for Questions

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SQUAD2.Q-Augmented-Dataset

Medi-CoQA

Conversational Question Answering on Clinical Text

Stars: ✭ 22 (-29.03%)

Mutual labels: question-answering, squad, bert

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+10896.77%)

Mutual labels: question-answering, squad, bert

FinBERT-QA

Financial Domain Question Answering with pre-trained BERT Language Model

Stars: ✭ 70 (+125.81%)

Mutual labels: question-answering, bert

cmrc2019

A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)

Stars: ✭ 118 (+280.65%)

Mutual labels: question-answering, bert

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+638.71%)

Mutual labels: question-answering, bert

Bi Att Flow

Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.

Stars: ✭ 1,472 (+4648.39%)

Mutual labels: question-answering, squad

Rust Bert

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Stars: ✭ 510 (+1545.16%)

Mutual labels: translation, question-answering

Transformer-QG-on-SQuAD

Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)

Stars: ✭ 28 (-9.68%)

Mutual labels: squad, bert

iamQA

中文wiki百科QA阅读理解问答系统，使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型，还有W2V词向量搜索,使用torchserve部署

Stars: ✭ 46 (+48.39%)

Mutual labels: question-answering, bert

BERT-for-Chinese-Question-Answering

No description or website provided.

Stars: ✭ 75 (+141.94%)

Mutual labels: question-answering, bert

Question-Answering-based-on-SQuAD

Question Answering System using BiDAF Model on SQuAD v2.0

Stars: ✭ 20 (-35.48%)

Mutual labels: question-answering, squad

extractive rc by runtime mt

Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"

Stars: ✭ 36 (+16.13%)

Mutual labels: question-answering, squad

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+21370.97%)

Mutual labels: question-answering, bert

DrFAQ

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Stars: ✭ 29 (-6.45%)

Mutual labels: question-answering, bert

Awesome Qa

😎 A curated list of the Question Answering (QA)

Stars: ✭ 596 (+1822.58%)

Mutual labels: question-answering, squad

TriB-QA

吹逼我们是认真的

Stars: ✭ 45 (+45.16%)

Mutual labels: question-answering, bert

question-answering

No description or website provided.

Stars: ✭ 32 (+3.23%)

Mutual labels: question-answering, squad

cdQA-ui

⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.

Stars: ✭ 19 (-38.71%)

Mutual labels: question-answering, bert

mcQA

🔮 Answering multiple choice questions with Language Models.

Stars: ✭ 23 (-25.81%)

Mutual labels: question-answering, bert

KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

Stars: ✭ 58 (+87.1%)

Mutual labels: question-answering, bert

View All Similar Projects ➔

SQuAD 2.Q - Augmented-Dataset

Developers - Ankit Chadha ([email protected]) and Rewa Sood ([email protected])

This is a release of an Augmented dataset we produced on top of Stanford Question Answering Dataset (SQuAD) 2.0.

The repository is called SQuAD 2.Q since only the questions out of the SQuAD 2.0 dataset have been augmented using the process of Back Translation. The work can easily be extended to Context paragraphs using the python script (augment.py).

Why just Questions?

SQuAD 2.0 is a dataset where the context come from Wikipedia paragraphs and the questions are written by Cloud workers. When questions are written by cloud workers it inherently adds syntatic variance and grammar usage of human cloud workers. The idea here is to help the network generalize to the syntatic variance in the question to generalize better at:

Understanding Questions
Understanding interactions between Question and Context (Attention)

How does SQuAD 2.Q help?

We present our model called BertQA : Attention on Steroids where SQuAD 2.Q50 helped the same model achieve 2 point F1 improvement over SQuAD 2.0.

Release notes:

SQuAD 2Q (100% Augmented - for every question in the dataset there is an augmented question)
SQuAD 2Q50 (50% Augmented)
SQuAD 2Q35 (35% Augmented)
augment.py (Python code to use Google Cloud API to augment the dataset)

Usage:

python augment.py

Back Translation:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ankit-ai / SQUAD2.Q-Augmented-Dataset

Programming Languages

Labels

Projects that are alternatives of or similar to SQUAD2.Q-Augmented-Dataset

SQuAD 2.Q - Augmented-Dataset

Developers - Ankit Chadha ([email protected]) and Rewa Sood ([email protected])

Why just Questions?

How does SQuAD 2.Q help?

Release notes:

Usage:

Back Translation: