All Projects → abhshkdz → neural-vqa-attention

abhshkdz / neural-vqa-attention

Licence: other
❓ Attention-based Visual Question Answering in Torch

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to neural-vqa-attention

lantern
[Android Library] Handling device flash as torch for Android.
Stars: ✭ 81 (-15.62%)
Mutual labels:  torch
hypnettorch
Package for working with hypernetworks in PyTorch.
Stars: ✭ 66 (-31.25%)
Mutual labels:  torch
gan-reverser
Reversing GAN image generation for similarity search and error/artifact fixing
Stars: ✭ 13 (-86.46%)
Mutual labels:  torch
deepgenres.torch
Predict the genre of a song using the Torch deep learning library
Stars: ✭ 18 (-81.25%)
Mutual labels:  torch
inpainting FRRN
Progressive Image Inpainting (Kolmogorov Team solution for Huawei Hackathon 2019 summer)
Stars: ✭ 30 (-68.75%)
Mutual labels:  torch
vrn-torch-to-keras
Transfer pre-trained VRN model from torch to Keras/Tensorflow
Stars: ✭ 63 (-34.37%)
Mutual labels:  torch
multiclass-semantic-segmentation
Experiments with UNET/FPN models and cityscapes/kitti datasets [Pytorch]
Stars: ✭ 96 (+0%)
Mutual labels:  torch
torch-pitch-shift
Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.
Stars: ✭ 70 (-27.08%)
Mutual labels:  torch
Captcha-Cracking
Crack number and Chinese captcha with both traditional and deep learning methods, based on Torch and python.
Stars: ✭ 35 (-63.54%)
Mutual labels:  torch
torch-dataframe
Utility class to manipulate dataset from CSV file
Stars: ✭ 67 (-30.21%)
Mutual labels:  torch
yann
Yet Another Neural Network Library 🤔
Stars: ✭ 26 (-72.92%)
Mutual labels:  torch
eccv16 attr2img
Torch Implemention of ECCV'16 paper: Attribute2Image
Stars: ✭ 93 (-3.12%)
Mutual labels:  torch
flambeau
Nim bindings to libtorch
Stars: ✭ 60 (-37.5%)
Mutual labels:  torch
ThArrays.jl
A Julia interface for PyTorch's C++ backend, focusing on Tensor, AD, and JIT
Stars: ✭ 23 (-76.04%)
Mutual labels:  torch
Jetson-Nano-image
Jetson Nano image with deep learning frameworks
Stars: ✭ 46 (-52.08%)
Mutual labels:  torch
graftr
graftr: an interactive shell to view and edit PyTorch checkpoints.
Stars: ✭ 89 (-7.29%)
Mutual labels:  torch
ALIGNet
code to train a neural network to align pairs of shapes without needing ground truth warps for supervision
Stars: ✭ 58 (-39.58%)
Mutual labels:  torch
WassersteinGAN.torch
Torch implementation of Wasserstein GAN https://arxiv.org/abs/1701.07875
Stars: ✭ 48 (-50%)
Mutual labels:  torch
deep-learning-platforms
deep-learning platforms,framework,data(深度学习平台、框架、资料)
Stars: ✭ 17 (-82.29%)
Mutual labels:  torch
sentence2vec
Deep sentence embedding using Sequence to Sequence learning
Stars: ✭ 23 (-76.04%)
Mutual labels:  torch

neural-vqa-attention

Torch implementation of an attention-based visual question answering model (Stacked Attention Networks for Image Question Answering, Yang et al., CVPR16).

Imgur

  1. Train your own network
    1. Extract image features
    2. Preprocess VQA dataset
    3. Training
  2. Use a pretrained model
    1. Pretrained models and data files
    2. Running evaluation
  3. Results

Intuitively, the model looks at an image, reads a question, and comes up with an answer to the question and a heatmap of where it looked in the image to answer it.

The model/code also supports referring back to the image multiple times (Stacked Attention) before producing the answer. This is supported via a num_attention_layers parameter in the code (default = 1).

NOTE: This is NOT a state-of-the-art model. Refer to MCB, MLB or HieCoAtt for that. This is a simple, somewhat interpretable model that gets decent accuracies and produces nice-looking results. The code was written about ~1 year ago as part of VQA-HAT, and I'd meant to release it earlier, but couldn't get around to cleaning things up.

If you just want to run the model on your own images, download links to pretrained models are given below.

Train your own network

Preprocess VQA dataset

Pass split as 1 to train on train and evaluate on val, and 2 to train on train+val and evaluate on test.

cd data/
python vqa_preprocessing.py --download True --split 1
cd ..
python prepro.py --input_train_json data/vqa_raw_train.json --input_test_json data/vqa_raw_test.json --num_ans 1000

Extract image features

Since we don't finetune the CNN, training is significantly faster if image features are pre-extracted. We use image features from VGG-19. The model can be downloaded and features extracted using:

sh scripts/download_vgg19.sh
th prepro_img.lua -image_root /path/to/coco/images/ -gpuid 0

Training

th train.lua

Use a pretrained model

Pretrained models and data files

All files available for download here.

  • san1_2.t7: model pretrained on train+val with 1 attention layer (SAN-1)
  • san2_2.t7: model pretrained on train+val with 2 attention layers (SAN-2)
  • params_1.json: vocabulary file for training on train, evaluating on val
  • params_2.json: vocabulary file for training on train+val, evaluating on test
  • qa_1.h5: QA features for training on train, evaluating on val
  • qa_2.h5: QA features for training on train+val, evaluating on test
  • img_train_1.h5 & img_test_1.h5: image features for training on train, evaluating on val
  • img_train_2.h5 & img_test_2.h5: image features for training on train+val, evaluating on test

Running evaluation

model_path=checkpoints/model.t7 qa_h5=data/qa.h5 params_json=data/params.json img_test_h5=data/img_test.h5 th eval.lua

This will generate a JSON file containing question ids and predicted answers. To compute accuracy on val, use VQA Evaluation Tools. For test, submit to VQA evaluation server on EvalAI.

Results

Format: sets of 3 columns, col 1 shows original image, 2 shows 'attention' heatmap of where the model looks, 3 shows image overlaid with attention. Input question and answer predicted by model are shown below examples.

More results available here.

Quantitative Results

Trained on train for val accuracies, and trained on train+val for test accuracies.

VQA v2.0

Method val test
SAN-1 53.15 55.28
SAN-2 52.82 -
d-LSTM + n-I 51.62 54.22
HieCoAtt 54.57 -
MCB 59.14 -

VQA v1.0

Method test-std
SAN-1 59.87
SAN-2 59.59
d-LSTM + n-I 58.16
HieCoAtt 62.10
MCB 65.40

References

Acknowledgements

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].