awslabs / unsupervised-qa

Licence: Apache-2.0 license

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to unsupervised-qa

explicit memory tracker

[ACL 2020] Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

Stars: ✭ 35 (-25.53%)

Mutual labels: question-answering, question-generation

MLH-Quizzet

This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.

Stars: ✭ 23 (-51.06%)

Mutual labels: question-answering, question-generation

text2text

Text2Text: Cross-lingual natural language processing and generation toolkit

Stars: ✭ 188 (+300%)

Mutual labels: question-answering, question-generation

Awesome Deep Learning And Machine Learning Questions

【不定期更新】收集整理的一些网站中（如知乎、Quora、Reddit、Stack Exchange等）与深度学习、机器学习、强化学习、数据科学相关的有价值的问题

Stars: ✭ 203 (+331.91%)

Mutual labels: question-answering

Forum

Ama Laravel? Torne se um Jedi e Ajude outros Padawans

Stars: ✭ 233 (+395.74%)

Mutual labels: question-answering

CS-DisMo

[ICCVW 2021] Rethinking Content and Style: Exploring Bias for Unsupervised Disentanglement

Stars: ✭ 20 (-57.45%)

Mutual labels: unsupervised-learning

question-generation

Neural Models for Key Phrase Detection and Question Generation

Stars: ✭ 29 (-38.3%)

Mutual labels: question-generation

Flowqa

Implementation of conversational QA model: FlowQA (with slight improvement)

Stars: ✭ 194 (+312.77%)

Mutual labels: question-answering

FinBERT-QA

Financial Domain Question Answering with pre-trained BERT Language Model

Stars: ✭ 70 (+48.94%)

Mutual labels: question-answering

esapp

An unsupervised Chinese word segmentation tool.

Stars: ✭ 13 (-72.34%)

Mutual labels: unsupervised-learning

TA3N

[ICCV 2019 Oral] TA3N: https://github.com/cmhungsteve/TA3N (Most updated repo)

Stars: ✭ 45 (-4.26%)

Mutual labels: unsupervised-learning

Dmn Tensorflow

Dynamic Memory Networks (https://arxiv.org/abs/1603.01417) in Tensorflow

Stars: ✭ 236 (+402.13%)

Mutual labels: question-answering

DrFAQ

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Stars: ✭ 29 (-38.3%)

Mutual labels: question-answering

Tensorflow Dsmm

Tensorflow implementations of various Deep Semantic Matching Models (DSMM).

Stars: ✭ 217 (+361.7%)

Mutual labels: question-answering

Joint-Motion-Estimation-and-Segmentation

[MICCAI'18] Joint Learning of Motion Estimation and Segmentation for Cardiac MR Image Sequences

Stars: ✭ 45 (-4.26%)

Mutual labels: unsupervised-learning

Kb Qa

基于知识库的中文问答系统（biLSTM）

Stars: ✭ 195 (+314.89%)

Mutual labels: question-answering

VideoNavQA

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Stars: ✭ 22 (-53.19%)

Mutual labels: question-answering

Agriculture knowledgegraph

农业知识图谱(AgriKG)：农业领域的信息检索，命名实体识别，关系抽取，智能问答，辅助决策

Stars: ✭ 2,957 (+6191.49%)

Mutual labels: question-answering

Jack

Jack the Reader

Stars: ✭ 242 (+414.89%)

Mutual labels: question-answering

cmrc2017

The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)

Stars: ✭ 90 (+91.49%)

Mutual labels: question-answering

View All Similar Projects ➔

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Code and synthetic data from our ACL 2020 paper

Abstract

Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.

Synthetic data

Generated synthetic data for the publication is located under enwiki_synthetic/

Requirements

PySpark
ElasticSearch 6

Instruction to generate retrieval-based synthetic data

Tokenize and perform NER:

spark-submit --master local[90] --driver-memory 200G spark_scripts/tokenize_and_ner_inputs.py --corpus=enwiki/clean/*/*.raw  --output outputs/sent-tok-rollup

Then we write the tokenized sentences to ElasticSearch index. This uses AES_HOSTS environment variable.

spark-submit --master local[90] --driver-memory 4G spark_scripts/write_sentence_level_es_index.py --corpus=outputs/sent-tok-rollup/rollup/ --es-index uqa-es-index --output outputs/write-es

Create QA synthetic dataset

spark-submit --master local[90] --driver-memory 300G spark_scripts/create_ds_synthetic_dataset.py --corpus=outputs/sent-tok-rollup/rollup/ --output outputs/synthetic-uqa-auxqs1awc1 --aux-qs=1 --aux-awc=1 --ulim-count=500000

Citation

You can cite our paper:

@inproceedings{fabbri-etal-2020-template,
    title = "Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering",
    author = "Fabbri, Alexander  and
      Ng, Patrick  and
      Wang, Zhiguo  and
      Nallapati, Ramesh  and
      Xiang, Bing",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.413",
    doi = "10.18653/v1/2020.acl-main.413",
    pages = "4508--4513",
    abstract = "Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14{\%}, and 20{\%} when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.",
}

License

This project is licensed under the Apache-2.0 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

awslabs / unsupervised-qa

Programming Languages

Labels

Projects that are alternatives of or similar to unsupervised-qa

Template-Based Question Generation from Retrieved Sentences for Improved Unsupervised Question Answering

Abstract

Synthetic data

Requirements

Instruction to generate retrieval-based synthetic data

Citation

License