All Projects → ymcui → Eval On Nn Of Rc

ymcui / Eval On Nn Of Rc

Licence: cc-by-sa-4.0
Empirical Evaluation on Current Neural Networks on Cloze-style Reading Comprehension

Projects that are alternatives of or similar to Eval On Nn Of Rc

Evalai
☁️ 🚀 📊 📈 Evaluating state of the art in AI
Stars: ✭ 1,087 (+1194.05%)
Mutual labels:  evaluation
Evalne
Source code for EvalNE, a Python library for evaluating Network Embedding methods.
Stars: ✭ 67 (-20.24%)
Mutual labels:  evaluation
Machine Learning
My Attempt(s) In The World Of ML/DL....
Stars: ✭ 78 (-7.14%)
Mutual labels:  attention
Attention Over Attention Tf Qa
论文“Attention-over-Attention Neural Networks for Reading Comprehension”中AoA模型实现
Stars: ✭ 58 (-30.95%)
Mutual labels:  attention
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-23.81%)
Mutual labels:  attention
Drepl
A REPL for D
Stars: ✭ 70 (-16.67%)
Mutual labels:  evaluation
Yolov4 Pytorch
This is a pytorch repository of YOLOv4, attentive YOLOv4 and mobilenet YOLOv4 with PASCAL VOC and COCO
Stars: ✭ 1,070 (+1173.81%)
Mutual labels:  attention
Attend infer repeat
A Tensorfflow implementation of Attend, Infer, Repeat
Stars: ✭ 82 (-2.38%)
Mutual labels:  attention
Pert
A simple command line (bash/shell) utility to estimate tasks using PERT [Program Evaluation and Review Technique]
Stars: ✭ 66 (-21.43%)
Mutual labels:  evaluation
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+11679.76%)
Mutual labels:  attention
Expr
Fast and lightweight math expression evaluator in C99
Stars: ✭ 61 (-27.38%)
Mutual labels:  evaluation
Global Self Attention Network
A Pytorch implementation of Global Self-Attention Network, a fully-attention backbone for vision tasks
Stars: ✭ 64 (-23.81%)
Mutual labels:  attention
Absa Pytorch
Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析,使用PyTorch实现。
Stars: ✭ 1,181 (+1305.95%)
Mutual labels:  attention
View Finding Network
A deep ranking network that learns to find good compositions in a photograph.
Stars: ✭ 57 (-32.14%)
Mutual labels:  evaluation
Attention Transfer
Improving Convolutional Networks via Attention Transfer (ICLR 2017)
Stars: ✭ 1,231 (+1365.48%)
Mutual labels:  attention
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+1180.95%)
Mutual labels:  evaluation
Enjoy Hamburger
[ICLR 2021] Is Attention Better Than Matrix Decomposition?
Stars: ✭ 69 (-17.86%)
Mutual labels:  attention
Self Attention Classification
document classification using LSTM + self attention
Stars: ✭ 84 (+0%)
Mutual labels:  attention
Vidvrd Helper
To keep updates with VRU Grand Challenge, please use https://github.com/NExTplusplus/VidVRD-helper
Stars: ✭ 81 (-3.57%)
Mutual labels:  evaluation
Hatn
Hierarchical Attention Transfer Network for Cross-domain Sentiment Classification (AAAI'18)
Stars: ✭ 73 (-13.1%)
Mutual labels:  attention

Empirical Evaluation on Current Neural Networks on Cloze-style Reading Comprehension

Note that, this repository will be updated irregularly.

Statement

If you find this repository helpful, please press the STAR button. Moreover, if you would like to use or repost the content in this repository, please indicate the orignal author and source link.

Introduction

This repository contains an empirical comparison on current neural networks on Cloze-style Reading Comprehension. The content only represent PERSONAL views on these works. Any discussions will be welcome. (Please go to Issue Tab)

For those who are not familiar with this task, please read related papers, such as Hermann et al., 2015.

The Way of Generating Large-scale Training Data

Several Cloze-style reading comprehension datasets are readily available. Some general comparisons are given below.

Dataset Language Genre Blank
Type
Document Query Content
CNN/Daily Mail
(Hermann et al., 2015)
English news NE news article summary w/
a blank
train+valid
+test
Children's Book Test (CBTest)
(Hill et al., 2015)
English story NE, CN,
V, P
20 consecutive
sentences
21th sentence w/
a blank
train+valid
+test
People Daily
(Cui et al., 2016a)
Chinese news NE, CN news article w/
a blank
the sentence that
blank belongs to
train+valid
+test
Children's Fairy Tale (CFT)
(Cui et al., 2016a)
Chinese story NE, CN story article w/
a blank
the sentence that
blank belongs to
test only1
Who did what
(Onishi et al., 2016)
English news NE news article deleting an NE from the first sentence of the question article N/A2
BookTest
(Bajgar et al., 2016)
English news NE,CN 20 consecutive
sentences
21th sentence w/
a blank
train+valid
+test

1 contains one human evaluated test set


2 as this 'dataset' is made from LDC corpus (Gigaword), it is not available to the public unless licenced by LDC
>**Abbreviation** NE: Named Entity, CN: Common Noun, V: Verb, P: Preposition

Neural Architectures

In this section, the models will be compared with each other. As illustrated, the comment only represents my own views. Also note that, the evaluation DO NOT take the model performance into account, and it just shows the characteristics of the model itself.

Rating will be in 1 ~ 5 stars, where 5★ means best, and 1★ means worst.
Take Kadlec et al., 2016 as an example,
in Model efficiency, I give a 5★, meaning the training speed is very fast;
in Implementation complexity, I give a 4★, meaning it is relatively easy to implement such a neural network;
in Hyper-parameter, I give a 4★, meaning there are fewer hyper-params that should be defined by human. Note that, here the hyper-parameters are only indicating the params in neural network itself, not including such as layer dimension or something else.

System Core
Architecture
Prediction
Range1
Model
Efficiency
Implementation
Complexity
Hyper-
parameter
Other
tricks
Hermann et al., 2015 Attention-
based RNN
All words in
document
★★★ ★★★ ★★★★ -
Hill et al., 2015 Memory
Network
All words in
document
★★★ ★★ ★★★ self-
supervision
Kadlec et al., 2016 Attention-
based RNN
All words in
document
★★★★★ ★★★★ ★★★★ -
Cui et al., 2016a Attention-
based RNN
All words in
document
★★★★ ★★★★ ★★★★ -
Chen et al., 2016 Attention-
based RNN
Only Entities in
document
★★★ ★★★★ ★★★★ -
Dhingra et al., 2016 Attention-
based RNN
- ★★★ ★★★ ★★★ -
Trischler et al., 2016 Attention-
based RNN
- ★★ ★★ ★★ -
Sordoni et al., 2016 Attention-
based RNN
All words in
document
★★★ ★★ ★★★ -
Cui et al., 2016b Attention-
based RNN
All words in
document
★★★★ ★★★ ★★★★ -
Weissenborn, 2016 Attention-
based RNN
Only Entities in
document
★★★ ★★ ★★★ -
Li et al., 2016 Attention-
based RNN
+CRF
- ★★★ ★★ ★★★ -
Shen et al., 2016
Munkhdalai and Yu, 2016
Mihaylov and Frank, 2018
Ghaeini et al., 2018

1 only shows the CNN/Daily Mail condition. In CBTest condition, all models only considers 10 candidates given in the datasets.

Training Details

Here, I will have a brief comparison on the training details of each model, and hopefully you will have a better concept in parameter tuning.

System Embed
Init
Hidden
Type
Gradient
Clip
Optimi-
zation
Batch
Size
Initial
LR
Dropout Imple-
mentation
Hermann et al., 2015 rand LSTM - RmsProp 32 0.00005 yes -
Hill et al., 2015 rand LSTM 5 or 10 SGD ? RANGE yes -
Kadlec et al., 2016 rand GRU 10 ADAM 32 0.001 no TH+B
Cui et al., 2016a rand GRU 10 ADAM 32 0.0005 yes TH+K
Chen et al., 2016 GloVe LSTM 10 SGD 32 0.1 yes -
Dhingra et al., 2016 word2vec GRU 10 ADAM 32 0.0005 yes TH+L
Trischler et al., 2016 rand GRU - ADAM 32 0.001 no TH+K
Sordoni et al., 2016 rand GRU 5 ADAM 32 0.001 yes TH+K
Cui et al., 2016b rand GRU 5 ADAM 32 0.001 yes TH+K
Weissenborn, 2016 GloVe GRU - ADAM 32 or 128 0.001 yes TF
Shen et al., 2016 GloVe GRU - ADAM 64 0.0005 yes -
Li et al., 2016 NNLM LSTM - RmsProp 120 - - -
Bajgar et al., 2016 rand GRU 10 ADAM 128 0.0005 no TH+B
Munkhdalai and Yu, 2016 GloVe LSTM 15 ADAM 32 0.001 yes -
Mihaylov and Frank, 2018 GloVe GRU ? ? 64 0.001 ? -
Ghaeini et al., 2018 GloVe GRU ? ADAM 32 0.0005 0.4 -

Abbreviation TH: Theano, B: Blocks, K: Keras, L: Lasagne, TF: TensorFlow

Training Tips

I show several tips in training these neural network models, FYI.

  1. Use adaptive learning algorithm, if you are not expertise in optimizing vanilla SGD. Use learning rate decay, even if you use adaptive learning algorithm, such as ADAM, RmsProp etc.

  2. Reshuffle training data in every epoch (general tips)

  3. Any neural network models that are based on AS Reader (Kadlec et al., 2016) should pay attention to the overfitting problem. Add dropout or l2_regularization to solve this problem. Also, it should be noted that using regularization will slow down the convergence speed.

  4. Do not use big batch size. Typically, 32 is a good start. Note that, though bigger batch size lead to faster training, the convergence speed may not be also fast (general tips)

  5. GRU typically shares the comparable performance with LSTM, so feel free to use either of them.

  6. Use gradient clipping 5~10, when using LSTM or GRU. Gradient Exploding can sometimes happen, which is more dangerous than gradient vanishing!

Leaderboard

In this part, I only show the results on CNN/Daily Mail and Children's book story Named Entity and Common Noun (short for CBT-NE and CBT-CN) datasets. Only single model evaluation is showed here. For more results, such as ensemble performance, please directly refer to the related papers.

The best result in each category is marked with bold face.

System CNN News
Valid
CNN News
Test
Daily Mail
Valid
Daily Mail
Test
CBT-NE
Valid
CBT-NE
Test
CBT-CN
Valid
CBT-CN
Test
Deep LSTM Reader1 55.0 57.0 63.3 62.2 - - - -
Attentive Reader1 61.6 63.0 70.5 69.0 - - - -
Impatient Reader1 61.8 63.8 69.0 68.0 - - - -
Human (ctx+query)2 - - - - - 81.6 - 81.6
MemNN(+self-sup)2 63.4 66.8 - - 70.4 66.6 64.2 63.0
AS Reader3 68.6 69.5 75.0 73.9 73.8 68.6 68.8 63.4
CAS Reader8 68.2 70.0 - - 74.2 69.2 68.2 65.7
Stanford AR4 72.4 72.4 76.9 75.8 - - - -
GA Reader5 73.0 73.8 76.7 75.7 74.9 69.0 69.0 63.9
EpiReader7 73.4 74.0 - - 75.3 69.7 71.5 67.4
Iterative Attention6 72.6 73.3 - - 75.2 68.6 72.1 69.2
QANN10 - 73.7 - 77.2 - 70.6 - -
ReasoNet13 72.9 74.7 77.6 76.6 - - - -
NSE Adp. Com. 15 - - - - 78.2 73.2 74.2 71.4
Bi-ATT Flow16 76.3 76.9 80.3 79.6 - - - -
FG gate17 - - - - 79.1 75.0 75.3 72.0
AoA Reader9 73.1 74.4 - - 77.8 72.0 72.2 69.4
AoA Reader + rerank9 - - - - 79.6 74.0 75.7 73.1
KnReader18 - - - - 77.4 71.4 71.8 67.6
DGR19 - - - - - 75.4 - 70.7

The following results are trained on BookTest dataset, which is substantially bigger than CBT.

System CNN News
Valid
CNN News
Test
Daily Mail
Valid
Daily Mail
Test
CBT-NE
Valid
CBT-NE
Test
CBT-CN
Valid
CBT-CN
Test
AS Reader14
- - - - 80.5 76.2 83.2 80.8

Related Papers

You can either download the related papers from this repository or in the following links.

  1. (Hermann et al., 2015) Teaching Machines to Read and Comprehend
    http://arxiv.org/abs/1506.03340
  1. (Hill et al., 2015) The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
    http://arxiv.org/abs/1511.02301
  1. (Kadlec et al., 2016) Text Understanding with the Attention Sum Reader Network
    http://arxiv.org/abs/1603.01547
  1. (Chen et al., 2016) A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
    https://arxiv.org/abs/1606.02858
  1. (Dhingra et al., 2016) Gated-Attention Readers for Text Comprehension
    https://arxiv.org/abs/1606.01549
  1. (Sordoni et al., 2016) Iterative Alternating Neural Attention for Machine Reading
    https://arxiv.org/abs/1606.02245
  1. (Trischler et al., 2016) Natural Language Comprehension with the EpiReader
    https://arxiv.org/abs/1606.02270
  1. (Cui et al., 2016a) Consensus Attention-based Neural Networks for Chinese Reading Comprehension
    https://arxiv.org/abs/1607.02250
  1. (Cui et al., 2016b) Attention-over-Attention Neural Networks for Reading Comprehension
    https://arxiv.org/abs/1607.04423
  1. (Dirk Weissenborn, 2016) Separating Answers from Queries for Neural Reading Comprehension
    http://arxiv.org/abs/1607.03316
  1. (Li et al., 2016) Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
    https://arxiv.org/abs/1607.06275
  1. (Onishi et al., 2016) Who did What: A Large-Scale Person-Centered Cloze Dataset
    http://arxiv.org/abs/1608.05457
  1. (Shen et al., 2016) ReasoNet: Learning to Stop Reading in Machine Comprehension
    https://arxiv.org/abs/1609.05284
  1. (Bajgar et al., 2016) Embracing data abundance: BookTest Dataset for Reading Comprehension
    https://arxiv.org/abs/1610.00956
  1. (Munkhdalai and Yu, 2016) Reasoning with Memory Augmented Neural Networks for Language Comprehension
    https://arxiv.org/abs/1610.06454
  1. (Seo et al., 2016) Bi-directional attention flow for machine comprehension
    http://arxiv.org/abs/1611.01603
  1. (Yang et al., 2016) Words or Characters? Fine-grained Gating for Reading Comprehension
    http://arxiv.org/abs/1611.01724
  1. (Mihaylov and Frank, 2018) Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge
    http://arxiv.org/abs/1805.07858
  1. (Ghaeini et al., 2018) Dependent Gated Reading for Cloze-Style Question Answering
    http://arxiv.org/abs/1805.10528

Contact

For any problems, please leave a message in the Github Issues.

Disclaimer

Any subjective comments in this repository only represents the idea of the owner (ymcui), and does not represent the claims of any organizations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].