All Projects → linjieli222 → Vqa_regat

linjieli222 / Vqa_regat

Licence: mit
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Vqa regat

Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (+244.19%)
Mutual labels:  vqa, attention
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-74.42%)
Mutual labels:  vqa, attention
Nlp Journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+900%)
Mutual labels:  attention
Ccnet Pure Pytorch
Criss-Cross Attention for Semantic Segmentation in pure Pytorch with a faster and more precise implementation.
Stars: ✭ 124 (-3.88%)
Mutual labels:  attention
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+8504.65%)
Mutual labels:  attention
Njunmt Tf
An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.
Stars: ✭ 97 (-24.81%)
Mutual labels:  attention
Leader Line
Draw a leader line in your web page.
Stars: ✭ 1,872 (+1351.16%)
Mutual labels:  attention
Self Attention Classification
document classification using LSTM + self attention
Stars: ✭ 84 (-34.88%)
Mutual labels:  attention
Chinese Chatbot
中文聊天机器人,基于10万组对白训练而成,采用注意力机制,对一般问题都会生成一个有意义的答复。已上传模型,可直接运行,跑不起来直播吃键盘。
Stars: ✭ 124 (-3.88%)
Mutual labels:  attention
Lambda Networks
Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute
Stars: ✭ 1,497 (+1060.47%)
Mutual labels:  attention
Fastpunct
Punctuation restoration and spell correction experiments.
Stars: ✭ 121 (-6.2%)
Mutual labels:  attention
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (-17.83%)
Mutual labels:  attention
Captcharecognition
End-to-end variable length Captcha recognition using CNN+RNN+Attention/CTC (pytorch implementation). 端到端的不定长验证码识别
Stars: ✭ 97 (-24.81%)
Mutual labels:  attention
Nlp Models Tensorflow
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Stars: ✭ 1,603 (+1142.64%)
Mutual labels:  attention
Cnn lstm for text classify
CNN, LSTM, NBOW, fasttext 中文文本分类
Stars: ✭ 90 (-30.23%)
Mutual labels:  attention
Absa keras
Keras Implementation of Aspect based Sentiment Analysis
Stars: ✭ 126 (-2.33%)
Mutual labels:  attention
Eval On Nn Of Rc
Empirical Evaluation on Current Neural Networks on Cloze-style Reading Comprehension
Stars: ✭ 84 (-34.88%)
Mutual labels:  attention
Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (-23.26%)
Mutual labels:  vqa
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (-13.18%)
Mutual labels:  attention
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (-2.33%)
Mutual labels:  attention

Relation-aware Graph Attention Network for Visual Question Answering

This repository is the implementation of Relation-aware Graph Attention Network for Visual Question Answering.

Overview of ReGAT

This repository is based on and inspired by @hengyuan-hu's work and @Jin-Hwa Kim's work. We sincerely thank for their sharing of the codes.

Prerequisites

You may need a machine with 4 GPUs with 16GB memory each, and PyTorch v1.0.1 for Python 3.

  1. Install PyTorch with CUDA10.0 and Python 3.7.
  2. Install h5py.
  3. Install block.bootstrap.pytorch.

If you are using miniconda, you can install all the prerequisites with tools/environment.yml.

Data

Our implementation uses the pretrained features from bottom-up-attention, the adaptive 10-100 features per image. In addition to this, the GloVe vectors and Visual Genome question answer pairs. For your convenience, the below script helps you to download preprocessed data.

source tools/download.sh

In addition to data, this script also download several pretrained models. In the end, the data folder and pretrained_models folder should be organized as shown below:

├── data
│   ├── Answers
│   │   ├── v2_mscoco_train2014_annotations.json
│   │   └── v2_mscoco_val2014_annotations.json
│   ├── Bottom-up-features-adaptive
│   │   ├── train.hdf5
│   │   ├── val.hdf5
│   │   └── test2015.hdf5
│   ├── Bottom-up-features-fixed
│   │   ├── train36.hdf5
│   │   ├── val36.hdf5
│   │   └── test2015_36.hdf5
│   ├── cache
│   │   ├── cp_v2_test_target.pkl
│   │   ├── cp_v2_train_target.pkl
│   │   ├── train_target.pkl
│   │   ├── val_target.pkl
│   │   ├── trainval_ans2label.pkl
│   │   └── trainval_label2ans.pkl
│   ├── cp_v2_annotations
│   │   ├── vqacp_v2_test_annotations.json
│   │   └── vqacp_v2_train_annotations.json
│   ├── cp_v2_questions
│   │   ├── vqacp_v2_test_questions.json
│   │   └── vqacp_v2_train_questions.json
│   ├── glove
│   │   ├── dictionary.pkl
│   │   ├── glove6b_init_300d.npy
│   │   └──- glove6b.300d.txt
│   ├── imgids
│   │   ├── test2015_36_imgid2idx.pkl
│   │   ├── test2015_ids.pkl
│   │   ├── test2015_imgid2idx.pkl
│   │   ├── train36_imgid2idx.pkl
│   │   ├── train_ids.pkl
│   │   ├── train_imgid2idx.pkl
│   │   ├── val36_imgid2idx.pkl
│   │   ├── val_ids.pkl
│   │   └── val_imgid2idx.pkl
│   ├── Questions
│   │   ├── v2_OpenEnded_mscoco_test-dev2015_questions.json
│   │   ├── v2_OpenEnded_mscoco_test2015_questions.json
│   │   ├── v2_OpenEnded_mscoco_train2014_questions.json
│   │   └── v2_OpenEnded_mscoco_val2014_questions.json
│   ├── visualGenome
│   │   ├── image_data.json
│   │   └── question_answers.json
├── pretrained_models (each model folder contains hps.json and model.pth)
│   ├── regat_implicit
│   │   ├── ban_1_implicit_vqa_196
│   │   ├── ban_4_implicit_vqa_cp_4422
│   │   ├── butd_implicit_vqa_6371
│   │   └── mutan_implicit_vqa_2632
│   ├── regat_semantic
│   │   ├── ban_1_semantic_vqa_7971
│   │   ├── ban_4_semantic_vqa_cp_9960
│   │   ├── butd_semantic_vqa_244
│   │   └── mutan_semantic_vqa_2711
│   ├── regat_spatial
│   │   ├── ban_1_spatial_vqa_1687
│   │   ├── ban_4_spatial_vqa_cp_4488
│   │   ├── butd_spatial_vqa_5942
│   │   └── mutan_spatial_vqa_3842

Training

python3 main.py --config config/butd_vqa.json

Evaluating

# take ban_1_implicit_vqa_196 as an example
# to evaluate cp_v2 performance, need to use --dataset cp_v2 --split test
python3 eval.py --output_folder pretrained_models/regat_implicit/ban_1_implicit_vqa_196

Citation

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{li2019relation,
  title={Relation-aware Graph Attention Network for Visual Question Answering},
  author={Li, Linjie and Gan, Zhe and Cheng, Yu and Liu, Jingjing},
  journal={ICCV},
  year={2019}
}

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].