All Projects → rentainhe → TRAR-VQA

rentainhe / TRAR-VQA

Licence: MIT license
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to TRAR-VQA

Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (+116.33%)
Mutual labels:  transformer, attention
Transformers.jl
Julia Implementation of Transformer models
Stars: ✭ 173 (+253.06%)
Mutual labels:  transformer, attention
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (+128.57%)
Mutual labels:  transformer, attention
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (+30.61%)
Mutual labels:  transformer, attention
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+6875.51%)
Mutual labels:  transformer, attention
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+20093.88%)
Mutual labels:  transformer, attention
Medical Transformer
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Stars: ✭ 153 (+212.24%)
Mutual labels:  transformer, attention
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+738.78%)
Mutual labels:  transformer, attention
Jddc solution 4th
2018-JDDC大赛第4名的解决方案
Stars: ✭ 235 (+379.59%)
Mutual labels:  transformer, attention
Self Attention Cv
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Stars: ✭ 209 (+326.53%)
Mutual labels:  transformer, attention
Cell Detr
Official and maintained implementation of the paper Attention-Based Transformers for Instance Segmentation of Cells in Microstructures [BIBM 2020].
Stars: ✭ 26 (-46.94%)
Mutual labels:  transformer, attention
RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
Stars: ✭ 83 (+69.39%)
Mutual labels:  clevr, visual-question-answering
Awesome Fast Attention
list of efficient attention modules
Stars: ✭ 627 (+1179.59%)
Mutual labels:  transformer, attention
Njunmt Tf
An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.
Stars: ✭ 97 (+97.96%)
Mutual labels:  transformer, attention
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+1053.06%)
Mutual labels:  transformer, attention
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (+136.73%)
Mutual labels:  transformer, attention
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (+704.08%)
Mutual labels:  transformer, attention
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+732.65%)
Mutual labels:  transformer, attention
Graphtransformer
Graph Transformer Architecture. Source code for "A Generalization of Transformer Networks to Graphs", DLG-AAAI'21.
Stars: ✭ 187 (+281.63%)
Mutual labels:  transformer, attention
seq2seq-pytorch
Sequence to Sequence Models in PyTorch
Stars: ✭ 41 (-16.33%)
Mutual labels:  transformer, attention

TRAnsformer Routing Networks (TRAR)

This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering". It currently includes the code for training TRAR on VQA2.0 and CLEVR dataset. Our TRAR model for REC task is coming soon.

Updates

  • (2021/10/10) Release our TRAR-VQA project.
  • (2021/08/31) Release our pretrained CLEVR TRAR model on train split: TRAR CLEVR Pretrained Models.
  • (2021/08/18) Release our pretrained TRAR model on train+val split and train+val+vg split: VQA-v2 TRAR Pretrained Models
  • (2021/08/16) Release our train2014, val2014 and test2015 data. Please check our dataset setup page DATA.md for more details.
  • (2021/08/15) Release our pretrained weight on train split. Please check our model page MODEL.md for more details.
  • (2021/08/13) The project page for TRAR is avaliable.

Introduction

TRAR vs Standard Transformer

TRAR Overall

Table of Contents

  1. Installation
  2. Dataset setup
  3. Config Introduction
  4. Training
  5. Validation and Testing
  6. Models

Installation

  • Clone this repo
git clone https://github.com/rentainhe/TRAR-VQA.git
cd TRAR-VQA
  • Create a conda virtual environment and activate it
conda create -n trar python=3.7 -y
conda activate trar
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
  • Install Spacy and initialize the GloVe as follows:
pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset setup

see DATA.md

Config Introduction

In trar.yml config we have these specific settings for TRAR model

ORDERS: [0, 1, 2, 3]
IMG_SCALE: 8 
ROUTING: 'hard' # {'soft', 'hard'}
POOLING: 'attention' # {'attention', 'avg', 'fc'}
TAU_POLICY: 1 # {0: 'SLOW', 1: 'FAST', 2: 'FINETUNE'}
TAU_MAX: 10
TAU_MIN: 0.1
BINARIZE: False
  • ORDERS=list, to set the local attention window size for routing.0 for global attention.
  • IMG_SCALE=int, which should be equal to the image feature size used for training. You should set IMG_SCALE: 16 for 16 × 16 training features.
  • ROUTING={'hard', 'soft'}, to set the Routing Block Type in TRAR model.
  • POOLING={'attention', 'avg', 'fc}, to set the Downsample Strategy used in Routing Block.
  • TAU_POLICY={0, 1, 2}, to set the temperature schedule in training TRAR when using ROUTING: 'hard'.
  • TAU_MAX=float, to set the maximum temperature in training.
  • TAU_MIN=float, to set the minimum temperature in training.
  • BINARIZE=bool, binarize the predicted alphas (alphas: the prob of choosing one path), which means during test time, we only keep the maximum alpha and set others to zero. If BINARIZE=False, it will keep all of the alphas and get a weight sum of different routing predict result by alphas. It won't influence the training time, just a small difference during test time.

Note that please set BINARIZE=False when ROUTING='soft', it's no need to binarize the path prob in soft routing block.

TAU_POLICY visualization

For MAX_EPOCH=13 with WARMUP_EPOCH=3 we have the following policy strategy:

Training

Train model on VQA-v2 with default hyperparameters:

python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar'

and the training log will be seved to:

results/log/log_run_<VERSION>.txt

Args:

  • --DATASET={'vqa', 'clevr'} to choose the task for training
  • --GPU=str, e.g. --GPU='2' to train model on specific GPU device.
  • --SPLIT={'train', 'train+val', train+val+vg'}, which combines different training datasets. The default training split is train.
  • --MAX_EPOCH=int to set the total training epoch number.

Resume Training

Resume training from specific saved model weights

python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --RESUME=True --CKPT_V=str --CKPT_E=int
  • --CKPT_V=str: the specific checkpoint version
  • --CKPT_E=int: the resumed epoch number

Multi-GPU Training and Gradient Accumulation

  1. Multi-GPU Training: Add --GPU='0, 1, 2, 3...' after the training scripts.
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --GPU='0,1,2,3'

The batch size on each GPU will be divided into BATCH_SIZE/GPUs automatically.

  1. Gradient Accumulation: Add --ACCU=n after the training scripts
python3 run.py --RUN='train' --DATASET='vqa' --MODEL='trar' --ACCU=2

This makes the optimizer accumulate gradients for n mini-batches and update the model weights once. BATCH_SIZE should be divided by n.

Validation and Testing

Warning: The args --MODEL and --DATASET should be set to the same values as those in the training stage.

Validate on Local Machine Offline evaluation only support the evaluations on the coco_2014_val dataset now.

  1. Use saved checkpoint
python3 run.py --RUN='val' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_V=str --CKPT_E=int
  1. Use the absolute path
python3 run.py --RUN='val' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_PATH=str

Online Testing All the evaluations on the test dataset of VQA-v2 and CLEVR benchmarks can be achieved as follows:

python3 run.py --RUN='test' --MODEL='trar' --DATASET='{vqa, clevr}' --CKPT_V=str --CKPT_E=int

Result file are saved at:

results/result_test/result_run_<CKPT_V>_<CKPT_E>.json

You can upload the obtained result json file to Eval AI to evaluate the scores.

Models

Here we provide our pretrained model and log, please see MODEL.md

Acknowledgements

Citation

if TRAR is helpful for your research or you wish to refer the baseline results published here, we'd really appreciate it if you could cite this paper:

@InProceedings{Zhou_2021_ICCV,
    author    = {Zhou, Yiyi and Ren, Tianhe and Zhu, Chaoyang and Sun, Xiaoshuai and Liu, Jianzhuang and Ding, Xinghao and Xu, Mingliang and Ji, Rongrong},
    title     = {TRAR: Routing the Attention Spans in Transformer for Visual Question Answering},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {2074-2084}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].