All Projects → paarthneekhara → convolutional-vqa

paarthneekhara / convolutional-vqa

Licence: other
No description or website provided.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to convolutional-vqa

bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (+58.97%)
Mutual labels:  visual-question-answering
KVQA
Korean Visual Question Answering
Stars: ✭ 44 (+12.82%)
Mutual labels:  visual-question-answering
FigureQA-baseline
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: ✭ 28 (-28.21%)
Mutual labels:  visual-question-answering
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+46.15%)
Mutual labels:  visual-question-answering
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-15.38%)
Mutual labels:  visual-question-answering
detect-shortcuts
Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Stars: ✭ 17 (-56.41%)
Mutual labels:  visual-question-answering
VCML
PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019
Stars: ✭ 45 (+15.38%)
Mutual labels:  visual-question-answering
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (+25.64%)
Mutual labels:  visual-question-answering
hexia
Mid-level PyTorch Based Framework for Visual Question Answering.
Stars: ✭ 24 (-38.46%)
Mutual labels:  visual-question-answering
RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
Stars: ✭ 83 (+112.82%)
Mutual labels:  visual-question-answering
self critical vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (+0%)
Mutual labels:  visual-question-answering

Fully Convolutional Visual Question Answering

This is an attention based model for VQA using a dilated convolutional neural network for modelling the question and a resnet for visual features. The text model is based on the recent convolutional architecture ByteNet. Stacked attention distributions over the images are then used to compute weighted image features, which are concatenated with the text features to predict the answer. Following is the rough diagram for the described model.

Model architecture

Requirements

  • Python 2.7.6
  • Tensorflow 1.3.0
  • nltk

Datasets and Paths

  • The model is can be trained either on VQA 1.0 or VQA 2.0. Download the dataset by running sh download.sh in Data directory. Unzip the downloaded files and create the directory Data/CNNModels. Download the pretrained Resnet-152 from here to Data/CNNModels.
  • Make 2 empty directories Data/Models1, Data/Models2 for saving the checkpoints while training VQA 1.0 and 2.0 respectively.

Usage

Extract the Image features

  • Extract the image features as per the following
    • DEFAULT - Resnet (14,14,2048) block4 features(attention model) - python extract_conv_features.py --feature_layer="block4"
    • VGG (14,14,512) pool5 features(attention model) - python extract_conv_features.py --feature_layer="pool5"
    • VGG fc7 features (4096,) - python extract_conv_features.py --feature_layer="fc7"

Preprocess Questions/Answers

  • Tokeinze the questions/answers using python data_loader.py --version=VQA_VERSION (1 or 2)

Training the attention model

  • Train using python train_evaluate.py --version=VQA_VERSION
  • Following are the customizable model options
    • residual_channels : Number channels in the residual block of bytenet/state of the lstm. Default 512.
    • batch_size : Default 64.
    • learning_rate : default 0.001
    • epochs : Default 25
    • version : VQA dataset version 1 or 2
    • sample_every : sample attention distributions/answers every x steps. Default 200.
    • evaluate_every : Evaluate over validation set every x steps. Default 6000.
    • resume_model : Resume training the model from a checkpoint file.
    • training_log_file : Log accuracy/steps in this filepath. Default 'Data/training_log.json' .
    • feature_layer : Which conv features to use. Default block4 of resnet.
    • text_model : Text model to use : LSTM or bytenet. Default is bytenet

Evaluating a trained model

  • The accuracy on the validation is logged every evaluate_every steps while training the model in Data/training_log.json.
  • Use python train_evaluate.py --evaluate_every=1 --max_steps=1 --resume_model="Trained Model Path (Data/Models<vqa_-version>/model<epoch>.ckpt)" to evaluate a checkpoint.

Generating Answers/Attention Distributions

Pretrained Model

You may download the pretrained model from here. Save the files in Data/Models1.

  • Use python generate.py --question="<QUESTION ABOUT THE IMAGE>" --image_file="<IMAGE FILE PATH>" --model_path="<PATH_TO_CHECKPOINT = Data/Models1/model10.ckpt>" to generate answer/attention distributions in Data/gen_samples.

Sample Results

Image Question Attention1 Attention2 Predicted Answer
is she going to eat both pizza No
What color is the traffic light green
is the persons hair short Yes
what musical instrument is beside the laptop keyboard
what color hat is the boy wearing blue
what are the men doing eating
what type of drink is in the glass orange juice
is there a house yes

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].