Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → NLPLearn → Qanet

NLPLearn / Qanet

Licence: mit

A Tensorflow implementation of QANet for machine reading comprehension

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow nlp cnn squad

Projects that are alternatives of or similar to Qanet

Sai

SDK for TEE AI Stick (includes model training script, inference library, examples)

Stars: ✭ 28 (-97.19%)

Mutual labels: cnn

Action Recognition Using 3d Resnet

Use 3D ResNet to extract features of UCF101 and HMDB51 and then classify them.

Stars: ✭ 32 (-96.79%)

Mutual labels: cnn

Twitter Sentiment Analysis

Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc.

Stars: ✭ 978 (-1.81%)

Mutual labels: cnn

Fusionnet

My implementation of the FusionNet for machine comprehension

Stars: ✭ 29 (-97.09%)

Mutual labels: squad

Smsop

Code for Statistically-motivated Second-order Pooling, ECCV2018

Stars: ✭ 32 (-96.79%)

Mutual labels: cnn

Deepmodels

TensorFlow Implementation of state-of-the-art models since 2012

Stars: ✭ 33 (-96.69%)

Mutual labels: cnn

Servenet

Service Classification based on Service Description

Stars: ✭ 21 (-97.89%)

Mutual labels: cnn

Mc Cnn Python

a python implementation of MC-CNN

Stars: ✭ 38 (-96.18%)

Mutual labels: cnn

Facerecognition

OpenCV 3 & Keras implementation of face recognition for specific people.

Stars: ✭ 32 (-96.79%)

Mutual labels: cnn

Covidaid

COVID-19 Detection Using Chest X-Ray

Stars: ✭ 35 (-96.49%)

Mutual labels: cnn

Kaggle Web Traffic Time Series Forecasting

Solution to Kaggle - Web Traffic Time Series Forecasting

Stars: ✭ 29 (-97.09%)

Mutual labels: cnn

Rnn Theano

使用Theano实现的一些RNN代码，包括最基本的RNN，LSTM，以及部分Attention模型，如论文MLSTM等

Stars: ✭ 31 (-96.89%)

Mutual labels: cnn

Dl Colab Notebooks

Try out deep learning models online on Google Colab

Stars: ✭ 969 (-2.71%)

Mutual labels: cnn

Cnn Question Classification Keras

Chinese Question Classifier (Keras Implementation) on BQuLD

Stars: ✭ 28 (-97.19%)

Mutual labels: cnn

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-96.29%)

Mutual labels: squad

Mxnet Ir

Image Retrieval Experiment Using Triplet Loss

Stars: ✭ 27 (-97.29%)

Mutual labels: cnn

Gaze Estimation

A deep learning based gaze estimation framework implemented with PyTorch

Stars: ✭ 33 (-96.69%)

Mutual labels: cnn

Newsapi

News API without any API KEY

Stars: ✭ 39 (-96.08%)

Mutual labels: cnn

Ijjs

a lightweight js runtime for IOT（一个面向物联网的JS运行时）

Stars: ✭ 38 (-96.18%)

Mutual labels: cnn

Neural Networks

All about Neural Networks!

Stars: ✭ 34 (-96.59%)

Mutual labels: cnn

View All Similar Projects ➔

QANet

A Tensorflow implementation of Google's QANet (previously Fast Reading Comprehension (FRC)) from ICLR2018. (Note: This is not an official implementation from the authors of the paper)

I wrote a blog post about implementing QANet. Check out here for more information!

Training and preprocessing pipeline have been adopted from R-Net by HKUST-KnowComp. Demo mode is working. After training, just use python config.py --mode demo to run an interactive demo server.

Due to a memory issue, a single head dot-product attention is used as opposed to a 8 heads multi-head attention like in the original paper. The hidden size is also reduced to 96 from 128 due to usage of a GTX1080 compared to a P100 used in the paper. (8GB of GPU memory is insufficient. If you have a 12GB memory GPU please share your training results with us.)

Currently, the best model reaches EM/F1 = 70.8/80.1 in 60k steps (6~8 hours). Detailed results are listed below.

Dataset

The dataset used for this task is Stanford Question Answering Dataset. Pretrained GloVe embeddings obtained from common crawl with 840B tokens used for words.

Requirements

Python>=2.7
NumPy
tqdm
TensorFlow>=1.5
spacy==2.0.9
bottle (only for demo)

Usage

To download and preprocess the data, run

# download SQuAD and Glove
sh download.sh
# preprocess the data
python config.py --mode prepro

Just like R-Net by HKUST-KnowComp, hyper parameters are stored in config.py. To debug/train/test/demo, run

python config.py --mode debug/train/test/demo

To evaluate the model with the official code, run

python evaluate-v1.1.py ~/data/squad/dev-v1.1.json train/{model_name}/answer/answer.json

The default directory for the tensorboard log file is train/{model_name}/event

Run in Docker container (optional)

To build the Docker image (requires nvidia-docker), run

nvidia-docker build -t tensorflow/qanet .

Set volume mount paths and port mappings (for demo mode)

export QANETPATH={/path/to/cloned/QANet}
export CONTAINERWORKDIR=/home/QANet
export HOSTPORT=8080
export CONTAINERPORT=8080

bash into the container

nvidia-docker run -v $QANETPATH:$CONTAINERWORKDIR -p $HOSTPORT:$CONTAINERPORT -it --rm tensorflow/qanet bash

Once inside the container, follow the commands provided above starting with downloading the SQuAD and Glove datasets.

Pretrained Model

Pretrained model weights are temporarily not available.

Detailed Implementaion

The model adopts character level convolution - max pooling - highway network for input representations similar to this paper by Yoon Kim.
The encoder consists of positional encoding - depthwise separable convolution - self attention - feed forward structure with layer norm in between.
Despite the original paper using 200, we observe that using a smaller character dimension leads to better generalization.
For regularization, a dropout of 0.1 is used every 2 sub-layers and 2 blocks.
Stochastic depth dropout is used to drop the residual connection with respect to increasing depth of the network as this model heavily relies on residual connections.
Query-to-Context attention is used along with Context-to-Query attention, which seems to improve the performance more than what the paper reported. This may be due to the lack of diversity in self attention due to 1 head (as opposed to 8 heads) which may have repetitive information that the query-to-context attention contains.
Learning rate increases from 0.0 to 0.001 in the first 1000 steps in inverse exponential scale and fixed to 0.001 from 1000 steps.
At inference, this model uses shadow variables maintained by the exponential moving average of all global variables.
This model uses a training / testing / preprocessing pipeline from R-Net for improved efficiency.

Results

Here are the collected results from this repository and the original paper.

Model	Training Steps	Size	Attention Heads	Data Size (aug)	EM	F1
My model	35,000	96	1	87k (no aug)	69.0	78.6
My model	60,000	96	1	87k (no aug)	70.4	79.6
My model ( reported by @jasonbw)	60,000	128	1	87k (no aug)	70.7	79.8
My model ( reported by @chesterkuo)	60,000	128	8	87k (no aug)	70.8	80.1
Original Paper	35,000	128	8	87k (no aug)	NA	77.0
Original Paper	150,000	128	8	87k (no aug)	73.6	82.7
Original Paper	340,000	128	8	240k (aug)	75.1	83.8

TODO's

[x] Training and testing the model
[x] Add trilinear function to Context-to-Query attention
[x] Apply dropouts + stochastic depth dropout
[x] Query-to-context attention
[x] Realtime Demo
[ ] Data augmentation by paraphrasing
[ ] Train with full hyperparameters (Augmented data, 8 heads, hidden units = 128)

Tensorboard

Run tensorboard for visualisation.

$ tensorboard --logdir=./

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 996

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (20) 🔗