All Projects → jnhwkim → Ban Vqa

jnhwkim / Ban Vqa

Licence: mit
Bilinear attention networks for visual question answering

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Ban Vqa

Abcnn
Implementation of ABCNN(Attention-Based Convolutional Neural Network) on Tensorflow
Stars: ✭ 264 (-41.2%)
Mutual labels:  attention
Crnn attention ocr chinese
CRNN with attention to do OCR,add Chinese recognition
Stars: ✭ 315 (-29.84%)
Mutual labels:  attention
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (-9.13%)
Mutual labels:  attention
Encoder decoder
Four styles of encoder decoder model by Python, Theano, Keras and Seq2Seq
Stars: ✭ 269 (-40.09%)
Mutual labels:  attention
Seq2seq Summarizer
Pointer-generator reinforced seq2seq summarization in PyTorch
Stars: ✭ 306 (-31.85%)
Mutual labels:  attention
Ner Bert
BERT-NER (nert-bert) with google bert https://github.com/google-research.
Stars: ✭ 339 (-24.5%)
Mutual labels:  attention
ResUNetPlusPlus
Official code for ResUNetplusplus for medical image segmentation (TensorFlow implementation) (IEEE ISM)
Stars: ✭ 69 (-84.63%)
Mutual labels:  attention
Gansformer
Generative Adversarial Transformers
Stars: ✭ 421 (-6.24%)
Mutual labels:  attention
Deepxi
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
Stars: ✭ 304 (-32.29%)
Mutual labels:  attention
Deep learning nlp
Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP
Stars: ✭ 407 (-9.35%)
Mutual labels:  attention
Abd Net
[ICCV 2019] "ABD-Net: Attentive but Diverse Person Re-Identification" https://arxiv.org/abs/1908.01114
Stars: ✭ 272 (-39.42%)
Mutual labels:  attention
Keras Transformer
Transformer implemented in Keras
Stars: ✭ 273 (-39.2%)
Mutual labels:  attention
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (-15.59%)
Mutual labels:  attention
Attentionwalk
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).
Stars: ✭ 266 (-40.76%)
Mutual labels:  attention
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (-8.46%)
Mutual labels:  attention
ai challenger 2018 sentiment analysis
Fine-grained Sentiment Analysis of User Reviews --- AI CHALLENGER 2018
Stars: ✭ 16 (-96.44%)
Mutual labels:  attention
Transformer Tensorflow
TensorFlow implementation of 'Attention Is All You Need (2017. 6)'
Stars: ✭ 319 (-28.95%)
Mutual labels:  attention
Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (-1.11%)
Mutual labels:  attention
Recurrent Visual Attention
A PyTorch Implementation of "Recurrent Models of Visual Attention"
Stars: ✭ 414 (-7.8%)
Mutual labels:  attention
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (-12.25%)
Mutual labels:  attention

Bilinear Attention Networks

This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entities tasks.

For the visual question answering task, our single model achieved 70.35 and an ensemble of 15 models achieved 71.84 (Test-standard, VQA 2.0). For the Flickr30k Entities task, our single model achieved 69.88 / 84.39 / 86.40 for [email protected], 5, and 10, respectively (slightly better than the original paper). For the detail, please refer to our technical report.

This repository is based on and inspired by @hengyuan-hu's work. We sincerely thank for their sharing of the codes.

Overview of bilinear attention networks

Updates

  • Bilinear attention networks using torch.einsum, backward-compatible. (12 Mar 2019)
  • Now compatible with PyTorch v1.0.1. (12 Mar 2019)

Prerequisites

You may need a machine with 4 GPUs, 64GB memory, and PyTorch v1.0.1 for Python 3.

  1. Install PyTorch with CUDA and Python 3.6.
  2. Install h5py.

WARNING: do not use PyTorch v1.0.0 due to a bug which induces underperformance.

VQA

Preprocessing

Our implementation uses the pretrained features from bottom-up-attention, the adaptive 10-100 features per image. In addition to this, the GloVe vectors. For the simplicity, the below script helps you to avoid a hassle.

All data should be downloaded to a data/ directory in the root directory of this repository.

The easiest way to download the data is to run the provided script tools/download.sh from the repository root. If the script does not work, it should be easy to examine the script and modify the steps outlined in it according to your needs. Then run tools/process.sh from the repository root to process the data to the correct format.

For now, you should manually download for the below options (used in our best single model).

We use a part of Visual Genome dataset for data augmentation. The image meta data and the question answers of Version 1.2 are needed to be placed in data/.

We use MS COCO captions to extract semantically connected words for the extended word embeddings along with the questions of VQA 2.0 and Visual Genome. You can download in here. Since the contribution of these captions is minor, you can skip the processing of MS COCO captions by removing cap elements in the target option in this line.

Counting module (Zhang et al., 2018) is integrated in this repository as counting.py for your convenience. The source repository can be found in @Cyanogenoid's vqa-counting.

Training

$ python3 main.py --use_both True --use_vg True

to start training (the options for the train/val splits and Visual Genome to train, respectively). The training and validation scores will be printed every epoch, and the best model will be saved under the directory "saved_models". The default hyperparameters should give you the best result of single model, which is around 70.04 for test-dev split.

Validation

If you trained a model with the training split using

$ python3 main.py

then you can run evaluate.py with appropriate options to evaluate its score for the validation split.

Pretrained model

We provide the pretrained model reported as the best single model in the paper (70.04 for test-dev, 70.35 for test-standard).

Please download the link and move to saved_models/ban/model_epoch12.pth (you may encounter a redirection page to confirm). The training log is found in here.

$ python3 test.py --label mytest

The result json file will be found in the directory results/.

Without Visual Genome augmentation

Without the Visual Genome augmentation, we get 69.50 (average of 8 models with the standard deviation of 0.096) for the test-dev split. We use the 8-glimpse model, the learning rate is starting with 0.001 (please see this change for the better results), 13 epochs, and the batch size of 256.

Flickr30k Entities

Preprocessing

You have to manually download Annotation and Sentence files to data/flickr30k/Flickr30kEntities.tar.gz. Then run the provided script tools/download_flickr.sh and tools/process_flickr.sh from the root of this repository, similarly to the case of VQA. Note that the image features of Flickr30k were generated using bottom-up-attention pretrained model.

Training

$ python3 main.py --task flickr --out saved_models/flickr

to start training. --gamma option does not applied. The default hyperparameters should give you approximately 69.6 for [email protected] for the test split.

Validation

Please download the link and move to saved_models/flickr/model_epoch5.pth (you may encounter a redirection page to confirm).

$ python3 evaluate.py --task flickr --input saved_models/flickr --epoch 5

to evaluate the scores for the test split.

Troubleshooting

Please check troubleshooting wiki and previous issue history.

Citation

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@inproceedings{Kim2018,
author = {Kim, Jin-Hwa and Jun, Jaehyun and Zhang, Byoung-Tak},
booktitle = {Advances in Neural Information Processing Systems 31},
title = {{Bilinear Attention Networks}},
pages = {1571--1581},
year = {2018}
}

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].