All Projects → andrejonasson → Dynamic Coattention Network Plus

andrejonasson / Dynamic Coattention Network Plus

Dynamic Coattention Network Plus (DCN+) TensorFlow implementation. Question answering using Deep NLP.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dynamic Coattention Network Plus

Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+601.71%)
Mutual labels:  question-answering, natural-language-processing
Conversational Ai
Conversational AI Reading Materials
Stars: ✭ 34 (-70.94%)
Mutual labels:  question-answering, natural-language-processing
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+622.22%)
Mutual labels:  question-answering, natural-language-processing
Cdqa
⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
Stars: ✭ 500 (+327.35%)
Mutual labels:  question-answering, natural-language-processing
Neural kbqa
Knowledge Base Question Answering using memory networks
Stars: ✭ 87 (-25.64%)
Mutual labels:  question-answering, natural-language-processing
Paper Reading
Paper reading list in natural language processing, including dialogue systems and text generation related topics.
Stars: ✭ 508 (+334.19%)
Mutual labels:  question-answering, natural-language-processing
Acl18 results
Code to reproduce results in our ACL 2018 paper "Did the Model Understand the Question?"
Stars: ✭ 31 (-73.5%)
Mutual labels:  question-answering, natural-language-processing
Cmrc2018
A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)
Stars: ✭ 238 (+103.42%)
Mutual labels:  question-answering, natural-language-processing
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (-27.35%)
Mutual labels:  question-answering, natural-language-processing
Bidaf Keras
Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
Stars: ✭ 60 (-48.72%)
Mutual labels:  question-answering, natural-language-processing
Gnn4nlp Papers
A list of recent papers about Graph Neural Network methods applied in NLP areas.
Stars: ✭ 405 (+246.15%)
Mutual labels:  question-answering, natural-language-processing
Neuronblocks
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
Stars: ✭ 1,356 (+1058.97%)
Mutual labels:  question-answering, natural-language-processing
Adam qas
ADAM - A Question Answering System. Inspired from IBM Watson
Stars: ✭ 330 (+182.05%)
Mutual labels:  question-answering, natural-language-processing
Chat
基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话
Stars: ✭ 516 (+341.03%)
Mutual labels:  question-answering, natural-language-processing
Jack
Jack the Reader
Stars: ✭ 242 (+106.84%)
Mutual labels:  question-answering, natural-language-processing
Spago
Self-contained Machine Learning and Natural Language Processing library in Go
Stars: ✭ 854 (+629.91%)
Mutual labels:  question-answering, natural-language-processing
Medquad
Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
Stars: ✭ 129 (+10.26%)
Mutual labels:  question-answering, natural-language-processing
Pytorch Question Answering
Important paper implementations for Question Answering using PyTorch
Stars: ✭ 154 (+31.62%)
Mutual labels:  question-answering, natural-language-processing
Cdqa Annotator
⛔ [NOT MAINTAINED] A web-based annotator for closed-domain question answering datasets with SQuAD format.
Stars: ✭ 48 (-58.97%)
Mutual labels:  question-answering, natural-language-processing
Sentence Similarity
PyTorch implementations of various deep learning models for paraphrase detection, semantic similarity, and textual entailment
Stars: ✭ 96 (-17.95%)
Mutual labels:  question-answering, natural-language-processing

Dynamic Coattention Network Plus - Question Answering

Introduction

SQuAD (Stanford Question Answering Dataset)[3][4] formulates a machine learning problem where the model receives a question and a passage and is tasked with answering the question using the passage. The answers are limited to spans of text. The training data consists of (question, paragraph, answer span) triplets. Due to the nature of the task, combining the information contained in the passage with the question posed is paramount to achieve good performance (See references for more information). Recurrent neural networks that combine the information from the question and paragraph using coattention mechanisms such as the Dynamic Coattention Network [1] and its deeper and improved version [2] have achieved the best results in the task so far.

Networks

Baseline model

Baseline model (BiLSTM + DCN-like Coattention + Naive decoder).

Starting point for hyperparameters

Steps = ~20000/40000
Word embedding size = 100
Hidden state size = 100
Optimizer = Adam
Batch size = 64
RNN Input Dropout = 15%
Learning Rate = 0.005/0.001

Achieves dev ~ F1 0.620 / EM 0.452

Dynamic Coattention Network (DCN)

Implemented using components from DCN+ but with one layer attention and a single directional initial LSTM encoder.

Dynamic Coattention Network Plus (DCN+)

DCN+ encoder combines the question and passage using a dot-product based coattention mechanism, similar to the attention in the Transformer Network, Vaswani et al (2017). The decoder is application specific, specifically made for finding an answer span within a passage, it uses an iterative mechanism for recovering from local minima. Instead of a mixed objective the implementation uses cross entropy like the vanilla DCN.

For the implementation see networks.dcn_plus. An effort has been made to document each component. Each component of the encoder (coattention layer, affinity softmax masking, sentinel vectors and encoder units) and certain parts of the decoder are modular and can easily be used with other networks.

Todos

  • Character embeddings (Char-CNN as in BiDAF [5])
  • Sparse mixture of experts

Additional Modules

Linear time maximum probability product answer span

The project contains an implementation of the dynamic programming approach for finding the maximum product of the start and end probabilities as used by BiDAF and others. The implementation is inspired by bi-att-flow, except that it acts on batches instead of single examples and is in TensorFlow.

Instructions

Dependencies

The project requires Python 3.6 with TensorFlow 1.10. Support for prior versions will not be added.

Getting started

Move under the project folder (the one containing the README.md)

  1. Install the requirements (you may want to create and activate a virtualenv)
$ pip install -r requirements.txt
  1. Download punkt if needed
$ python -m nltk.downloader punkt

then download and preprocess SQuAD using

$ python question_answering/preprocessing/squad_preprocess.py

While the preprocessing is running you can continue with Step 3 in another terminal.

  1. Issue the command
$ python question_answering/preprocessing/dwr.py <GLOVE_SOURCE>

to download and extract GloVe embeddings, where <GLOVE_SOURCE> is either wiki for Wikipedia 100/200/300 dimensional GloVe word embeddings (~800mb) or crawl_ci/crawl_cs for Common Crawl 300 dimensional GloVe word embeddings (~1.8-2.2gb) where crawl_ci is the case insensitive version. Note that at a later step Common Crawl requires at least 4 hours of processing while Wikipedia 100 dimensional GloVe finishes in about half an hour.

  1. When Step 2 and 3 are complete change directory to the folder containing the code (main.py etc.) and run
$ python preprocessing/qa_data.py --glove_dim <EMBEDDINGS_DIMENSIONS> --glove_source <GLOVE_SOURCE>

replacing <EMBEDDINGS_DIMENSIONS> by the word embedding size you want (100, 200 or 300) and <GLOVE_SOURCE> by the embedding chosen above. qa_data.py will process the embeddings and create a 95-5 split of the training data where the 95% will be used as a training set and the rest as a development set.

Usage

The default mode of main.py is to train a DCN+ network, run

$ python main.py --embedding_size <EMBEDDINGS_DIMENSIONS>

to begin training. See the source code for all the arguments and modes that main.py supports. Checkpoints and logs will be placed under a timestamped folder in the ../checkpoints folder by default.

Interactive Shell

To see a trained model in action, load your model in mode shell and ask it questions about passages.

Tensorboard

For Tensorboard, run

$ tensorboard --logdir checkpoints

from the project folder and navigate to localhost:6006. The gradient norm and learning rate should be present among other metrics. The computational graph can also be viewed.

Acknowledgements

The project uses code from Stanford's CS224n to read and transform the original SQuAD dataset together with the GloVe vectors to an appropriate format for model development. These files or functions have been annotated with "CS224n" at the beginning.

References

[1] Dynamic Coattention Networks For Question Answering, Xiong et al, https://arxiv.org/abs/1611.01604, 2016

[2] DCN+: Mixed Objective and Deep Residual Coattention for Question Answering, Xiong et al, https://arxiv.org/abs/1711.00106, 2017

[3] SQuAD: 100,000+ Questions for Machine Comprehension of Text, Rajpurkar et al, https://arxiv.org/abs/1606.05250, 2016

[4] https://rajpurkar.github.io/SQuAD-explorer/

[5] Bidirectional Attention Flow for Machine Comprehension, Seo et al, https://arxiv.org/abs/1611.01603, 2016

Author

André Jonasson / @andrejonasson

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].