Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.

Stars: ✭ 203 (-5.58%)

Mutual labels: natural-language-processing

Opennmt

Open Source Neural Machine Translation in Torch (deprecated)

Stars: ✭ 2,339 (+987.91%)

Mutual labels: torch

Hardware Aware Transformers

[ACL 2020] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Stars: ✭ 206 (-4.19%)

Mutual labels: natural-language-processing

Aind Nlp

Coding exercises for the Natural Language Processing concentration, part of Udacity's AIND program.

Stars: ✭ 202 (-6.05%)

Mutual labels: natural-language-processing

Conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

Stars: ✭ 207 (-3.72%)

Mutual labels: natural-language-processing

Gluon Nlp

NLP made easy

Stars: ✭ 2,344 (+990.23%)

Mutual labels: natural-language-processing

Spacy Lookup

Named Entity Recognition based on dictionaries

Stars: ✭ 212 (-1.4%)

Mutual labels: natural-language-processing

Netron

Visualizer for neural network, deep learning, and machine learning models

Stars: ✭ 17,193 (+7896.74%)

Mutual labels: torch

Shifterator

Interpretable data visualizations for understanding how texts differ at the word level

Stars: ✭ 209 (-2.79%)

Mutual labels: natural-language-processing

Minerva

Meandering In Networks of Entities to Reach Verisimilar Answers

Stars: ✭ 205 (-4.65%)

Mutual labels: natural-language-processing

Stringi

THE String Processing Package for R (with ICU)

Stars: ✭ 204 (-5.12%)

Mutual labels: natural-language-processing

Kagnet

Knowledge-Aware Graph Networks for Commonsense Reasoning (EMNLP-IJCNLP 19)

Stars: ✭ 205 (-4.65%)

Mutual labels: natural-language-processing

Pytorch graph Rel

A PyTorch implementation of GraphRel

Stars: ✭ 204 (-5.12%)

Mutual labels: natural-language-processing

Nlp Roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP

Stars: ✭ 2,653 (+1133.95%)

Mutual labels: natural-language-processing

Torchinfo

View model summaries in PyTorch!

Stars: ✭ 203 (-5.58%)

Mutual labels: torch

Orn

Oriented Response Networks, in CVPR 2017

Stars: ✭ 207 (-3.72%)

Mutual labels: torch

Torch

R Interface to Torch

Stars: ✭ 214 (-0.47%)

Mutual labels: torch

Neat Vision

Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)

Stars: ✭ 213 (-0.93%)

Mutual labels: natural-language-processing

View All Similar Projects ➔

VisDial

Code for the paper

Visual Dialog
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
arxiv.org/abs/1611.08669
CVPR 2017 (Spotlight)

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the AI agent has to answer the question.

Demo: demo.visualdialog.org

This repository contains code for training, evaluating and visualizing results for all combinations of encoder-decoder architectures described in the paper. Specifically, we have 3 encoders: Late Fusion (LF), Hierarchical Recurrent Encoder (HRE), Memory Network (MN), and 2 kinds of decoding: Generative (G) and Discriminative (D).

If you find this code useful, consider citing our work:

@inproceedings{visdial,
  title={{V}isual {D}ialog},
  author={Abhishek Das and Satwik Kottur and Khushi Gupta and Avi Singh
    and Deshraj Yadav and Jos\'e M.F. Moura and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2017}
}

Setup

All our code is implemented in Torch (Lua). Installation instructions are as follows:

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
TORCH_LUA_VERSION=LUA51 ./install.sh

Additionally, our code uses the following packages: torch/torch7, torch/nn, torch/nngraph, Element-Research/rnn, torch/image, lua-cjson, loadcaffe, torch-hdf5. After Torch is installed, these can be installed/updated using:

luarocks install torch
luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install lua-cjson
luarocks install loadcaffe
luarocks install luabitop
luarocks install totem

NOTE: luarocks install rnn defaults to torch/rnn, follow these steps to install Element-Research/rnn.

git clone https://github.com/Element-Research/rnn.git
cd rnn
luarocks make rocks/rnn-scm-1.rockspec

Installation instructions for torch-hdf5 are given here.

NOTE: torch-hdf5 does not work with few versions of gcc. It is recommended that you use gcc 4.8 / gcc 4.9 with Lua 5.1 for proper installation of torch-hdf5.

Running on GPUs

Although our code should work on CPUs, it is highly recommended to use GPU acceleration with CUDA. You'll also need torch/cutorch, torch/cudnn and torch/cunn.

luarocks install cutorch
luarocks install cunn
luarocks install cudnn

Training your own network

Preprocessing VisDial

The preprocessing script is in Python, and you'll need to install NLTK.

pip install nltk
pip install numpy
pip install h5py
python -c "import nltk; nltk.download('all')"

VisDial v1.0 dataset can be downloaded and preprocessed as specified below. The path provided as -image_root must have four subdirectories - train2014 and val2014 as per COCO dataset, VisualDialog_val2018 and VisualDialog_test2018 which can be downloaded from here.

cd data
python prepro.py -download -image_root /path/to/images
cd ..

To download and preprocess Visdial v0.9 dataset, provide an extra -version 0.9 argument while execution.

This script will generate the files data/visdial_data.h5 (contains tokenized captions, questions, answers, image indices) and data/visdial_params.json (contains vocabulary mappings and COCO image ids).

Extracting image features

Since we don't finetune the CNN, training is significantly faster if image features are pre-extracted. Currently this repository provides support for extraction from VGG-16 and ResNets. We use image features from VGG-16. The VGG-16 model can be downloaded and features extracted using:

sh scripts/download_model.sh vgg 16  # works for 19 as well
cd data
# For all models except mn-att-ques-im-hist
th prepro_img_vgg16.lua -imageRoot /path/to/images -gpuid 0
# For mn-att-ques-im-hist
th prepro_img_vgg16.lua -imageRoot /path/to/images -imgSize 448 -layerName pool5 -gpuid 0

Similarly, ResNet models released by Facebook can be used for feature extraction. Feature extraction can be carried out in a similar manner as VGG-16:

sh scripts/download_model.sh resnet 200  # works for 18, 34, 50, 101, 152 as well
cd data
th prepro_img_resnet.lua -imageRoot /path/to/images -cnnModel /path/to/t7/model -gpuid 0

Running either of these should generate data/data_img.h5 containing features for train, val and test splits corresponding to VisDial v1.0.

Training

Finally, we can get to training models! All supported encoders are in the encoders/ folder (lf-ques, lf-ques-im, lf-ques-hist, lf-ques-im-hist, hre-ques-hist, hre-ques-im-hist, hrea-ques-im-hist, mn-ques-hist, mn-ques-im-hist, mn-att-ques-im-hist), and decoders in the decoders/ folder (gen and disc).

Generative (gen) decoding tries to maximize likelihood of ground-truth response and only has access to single input-output pairs of dialog, while discriminative (disc) decoding makes use of 100 candidate option responses provided for every round of dialog, and maximizes likelihood of correct option.

Encoders and decoders can be arbitrarily plugged together. For example, to train an HRE model with question and history information only (no images), and generative decoding:

th train.lua -encoder hre-ques-hist -decoder gen -gpuid 0

Similarly, to train a Memory Network model with question, image and history information, and discriminative decoding:

th train.lua -encoder mn-ques-im-hist -decoder disc -gpuid 0

Note: For attention based encoders, set both imgSpatialSize and imgFeatureSize command line params, feature dimensions are interpreted as (batch X spatial X spatial X feature). For other encoders, imgSpatialSize is redundant.

The training script saves model snapshots at regular intervals in the checkpoints/ folder.

It takes about 15-20 epochs to train models with generative decoding to convergence, and 4-8 epochs for discriminative decoding.

Evaluation

We evaluate model performance by where it ranks human response given 100 response options for every round of dialog, based on retrieval metrics — mean reciprocal rank, [email protected], [email protected], [email protected], mean rank.

Model evaluation can be run using:

th evaluate.lua -loadPath checkpoints/model.t7 -gpuid 0

Note that evaluation requires image features data/data_img.h5, tokenized dialogs data/visdial_data.h5 and vocabulary mappings data/visdial_params.json.

Running Beam Search & Visualizing Results

We also include code for running beam search on your model snapshots. This gives significantly nicer results than argmax decoding, and can be run as follows:

th generate.lua -loadPath checkpoints/model.t7 -maxThreads 50

This would compute predictions for 50 threads from the val split and save results in vis/results/results.json.

cd vis
# python 3.6
python -m http.server
# python 2.7
# python -m SimpleHTTPServer

Now visit localhost:8000 in your browser to see generated results.

Sample results from HRE-QIH-G available here.

Download Extracted Features & Pretrained Models

v0.9

Extracted features for v0.9 train and val are available for download.

visdial_data.h5: Tokenized captions, questions, answers, image indices
visdial_params.json: Vocabulary mappings and COCO image ids
data_img_vgg16_relu7.h5: VGG16 relu7 image features
data_img_vgg16_pool5.h5: VGG16 pool5 image features

Pretrained models

Trained on v0.9 train, results on v0.9 val.

^_Encoder	^_Decoder	^_CNN	^_MRR	^{_{[email protected]}}	^{_{[email protected]}}	^{_{[email protected]}}	^_MR	^_Download
^_lf-ques	^_gen	^_VGG-16	^_0.5048	^_0.3974	^_0.6067	^_0.6649	^_17.8003	^{_{lf-ques-gen-vgg16-18}}
^{_lf-ques-hist}	^_gen	^_VGG-16	^_0.5099	^_0.4012	^_0.6155	^_0.6740	^_17.3974	^{_{lf-ques-hist-gen-vgg16-18}}
^_lf-ques-im	^_gen	^_VGG-16	^_0.5206	^_0.4206	^_0.6165	^_0.6760	^_17.0578	^{_{lf-ques-im-gen-vgg16-22}}
^{_{lf-ques-im-hist}}	^_gen	^_VGG-16	^_0.5146	^_0.4086	^_0.6205	^_0.6828	^_16.7553	^{_{lf-ques-im-hist-gen-vgg16-26}}
^{_{lf-att-ques-im-hist}}	^_gen	^_VGG-16	^_0.5354	^_0.4354	^_0.6355	^_0.6941	^_16.7663	^{_{lf-att-ques-im-hist-gen-vgg16-80}}
^{_{hre-ques-hist}}	^_gen	^_VGG-16	^_0.5089	^_0.4000	^_0.6154	^_0.6739	^_17.3618	^{_{hre-ques-hist-gen-vgg16-18}}
^{_{hre-ques-im-hist}}	^_gen	^_VGG-16	^_0.5237	^_0.4223	^_0.6228	^_0.6811	^_16.9669	^{_{hre-ques-im-hist-gen-vgg16-14}}
^{_{hrea-ques-im-hist}}	^_gen	^_VGG-16	^_0.5238	^_0.4213	^_0.6244	^_0.6842	^_16.6044	^{_{hrea-ques-im-hist-gen-vgg16-24}}
^{_mn-ques-hist}	^_gen	^_VGG-16	^_0.5131	^_0.4057	^_0.6176	^_0.6770	^_17.6253	^{_{mn-ques-hist-gen-vgg16-102}}
^{_{mn-ques-im-hist}}	^_gen	^_VGG-16	^_0.5258	^_0.4229	^_0.6274	^_0.6874	^_16.9871	^{_{mn-ques-im-hist-gen-vgg16-78}}
^{_{mn-att-ques-im-hist}}	^_gen	^_VGG-16	^_0.5341	^_0.4354	^_0.6318	^_0.6903	^_17.0726	^{_{mn-att-ques-im-hist-gen-vgg16-100}}
^_lf-ques	^_disc	^_VGG-16	^_0.5491	^_0.4113	^_0.7020	^_0.7964	^_7.1519	^{_{lf-ques-disc-vgg16-10}}
^{_lf-ques-hist}	^_disc	^_VGG-16	^_0.5724	^_0.4319	^_0.7308	^_0.8251	^_6.2847	^{_{lf-ques-hist-disc-vgg16-8}}
^_lf-ques-im	^_disc	^_VGG-16	^_0.5745	^_0.4331	^_0.7398	^_0.8340	^_5.9801	^{_{lf-ques-im-disc-vgg16-12}}
^{_{lf-ques-im-hist}}	^_disc	^_VGG-16	^_0.5911	^_0.4490	^_0.7563	^_0.8493	^_5.5493	^{_{lf-ques-im-hist-disc-vgg16-8}}
^{_{lf-att-ques-im-hist}}	^_disc	^_VGG-16	^_0.6079	^_0.4692	^_0.7731	^_0.8635	^_5.1965	^{_{lf-att-ques-im-hist-disc-vgg16-20}}
^{_{hre-ques-hist}}	^_disc	^_VGG-16	^_0.5668	^_0.4265	^_0.7245	^_0.8207	^_6.3701	^{_{hre-ques-hist-disc-vgg16-4}}
^{_{hre-ques-im-hist}}	^_disc	^_VGG-16	^_0.5818	^_0.4461	^_0.7373	^_0.8342	^_5.9647	^{_{hre-ques-im-hist-disc-vgg16-4}}
^{_{hrea-ques-im-hist}}	^_disc	^_VGG-16	^_0.5821	^_0.4456	^_0.7378	^_0.8341	^_5.9646	^{_{hrea-ques-im-hist-disc-vgg16-4}}
^{_mn-ques-hist}	^_disc	^_VGG-16	^_0.5831	^_0.4388	^_0.7507	^_0.8434	^_5.8090	^{_{mn-ques-hist-disc-vgg16-20}}
^{_{mn-ques-im-hist}}	^_disc	^_VGG-16	^_0.5971	^_0.4562	^_0.7627	^_0.8539	^_5.4218	^{_{mn-ques-im-hist-disc-vgg16-12}}
^{_{mn-att-ques-im-hist}}	^_disc	^_VGG-16	^_0.6082	^_0.4700	^_0.7724	^_0.8623	^_5.2930	^{_{mn-att-ques-im-hist-disc-vgg16-28}}

v1.0

Extracted features for v1.0 train, val and test are available for download.

visdial_data_train.h5: Tokenized captions, questions, answers, image indices, for training on train
visdial_params_train.json: Vocabulary mappings and COCO image ids for training on train
data_img_vgg16_relu7_train.h5: VGG16 relu7 image features for training on train
data_img_vgg16_pool5_train.h5: VGG16 pool5 image features for training on train
visdial_data_trainval.h5: Tokenized captions, questions, answers, image indices, for training on train+val
visdial_params_trainval.json: Vocabulary mappings and COCO image ids for training on train+val
data_img_vgg16_relu7_trainval.h5: VGG16 relu7 image features for training on train+val
data_img_vgg16_pool5_trainval.h5: VGG16 pool5 image features for training on train+val

Pretrained models

Trained on v1.0 train + v1.0 val, results on v1.0 test-std. Leaderboard here.

^_Encoder	^_Decoder	^_CNN	^_NDCG	^_MRR	^{_{[email protected]}}	^{_{[email protected]}}	^{_{[email protected]}}	^_MR	^_Download
^{_{lf-ques-im-hist}}	^_gen	^_VGG-16	^_0.5121	^_0.4568	^_35.08	^_55.92	^_64.02	^_18.8140	^{_{lf-ques-im-hist-gen-vgg16-24}}
^{_{hre-ques-im-hist}}	^_gen	^_VGG-16	^_0.5245	^_0.4561	^_34.78	^_56.18	^_63.72	^_18.7778	^{_{hre-ques-im-hist-gen-vgg16-20}}
^{_{mn-ques-im-hist}}	^_gen	^_VGG-16	^_0.5280	^_0.4580	^_35.05	^_56.35	^_63.92	^_19.3128	^{_{mn-ques-im-hist-gen-vgg16-92}}
^{_{lf-att-ques-im-hist}}	^_gen	^_VGG-16	^_0.5362	^_0.4697	^_36.58	^_57.40	^_64.48	^_18.9550	^{_{lf-att-ques-im-hist-gen-vgg16-82}}
^{_{mn-att-ques-im-hist}}	^_gen	^_VGG-16	^_0.5367	^_0.4650	^_36.00	^_56.80	^_64.25	^_19.3470	^{_{mn-att-ques-im-hist-gen-vgg16-100}}
^{_{lf-ques-im-hist}}	^_disc	^_VGG-16	^_0.4531	^_0.5542	^_40.95	^_72.45	^_82.83	^_5.9532	^{_{lf-ques-im-hist-disc-vgg16-8}}
^{_{hre-ques-im-hist}}	^_disc	^_VGG-16	^_0.4546	^_0.5416	^_39.93	^_70.45	^_81.50	^_6.4082	^{_{hre-ques-im-hist-disc-vgg16-4}}
^{_{mn-ques-im-hist}}	^_disc	^_VGG-16	^_0.4750	^_0.5549	^_40.98	^_72.30	^_83.30	^_5.9245	^{_{mn-ques-im-hist-disc-vgg16-12}}
^{_{lf-att-ques-im-hist}}	^_disc	^_VGG-16	^_0.4976	^_0.5707	^_42.08	^_74.82	^_85.05	^_5.4092	^{_{lf-att-ques-im-hist-disc-vgg16-24}}
^{_{mn-att-ques-im-hist}}	^_disc	^_VGG-16	^_0.4958	^_0.5690	^_42.42	^_74.00	^_84.35	^_5.5852	^{_{mn-att-ques-im-hist-disc-vgg16-24}}

License

BSD

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 215

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗