All Projects → Cyanogenoid → Pytorch Vqa

Cyanogenoid / Pytorch Vqa

Strong baseline for visual question answering

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pytorch Vqa

Baseliner
All your baseline are belong to us
Stars: ✭ 35 (-77.85%)
Mutual labels:  baseline
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (-63.92%)
Mutual labels:  vqa
Fast Reid
SOTA Re-identification Methods and Toolbox
Stars: ✭ 2,287 (+1347.47%)
Mutual labels:  baseline
3d Pose Baseline
A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.
Stars: ✭ 1,047 (+562.66%)
Mutual labels:  baseline
Ssl Baseline
DevSec SSL/TLS Baseline - InSpec Profile
Stars: ✭ 56 (-64.56%)
Mutual labels:  baseline
Rampy
Python software for spectral data processing (IR, Raman, XAS...)
Stars: ✭ 92 (-41.77%)
Mutual labels:  baseline
Visual Question Answering
📷 ❓ Visual Question Answering Demo and Algorithmia API
Stars: ✭ 18 (-88.61%)
Mutual labels:  vqa
Vqa Mfb
Stars: ✭ 153 (-3.16%)
Mutual labels:  vqa
Vqa
CloudCV Visual Question Answering Demo
Stars: ✭ 57 (-63.92%)
Mutual labels:  vqa
Vqa regat
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (-18.35%)
Mutual labels:  vqa
Fashion Tag
Baseline of FashionAI Competition based on Keras.
Stars: ✭ 50 (-68.35%)
Mutual labels:  baseline
Jsdoc Baseline
An experimental, extensible template for JSDoc.
Stars: ✭ 51 (-67.72%)
Mutual labels:  baseline
Vqa Tensorflow
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (-37.97%)
Mutual labels:  vqa
Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+525.95%)
Mutual labels:  vqa
Competition baselines
开源的各大比赛baseline
Stars: ✭ 150 (-5.06%)
Mutual labels:  baseline
Vizwiz Vqa Pytorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Stars: ✭ 33 (-79.11%)
Mutual labels:  vqa
Nginx Baseline
DevSec Nginx Baseline - InSpec Profile
Stars: ✭ 71 (-55.06%)
Mutual labels:  baseline
Siem
SIEM Tactics, Techiques, and Procedures
Stars: ✭ 157 (-0.63%)
Mutual labels:  baseline
Niui
Lightweight, feature-rich, accessible front-end library
Stars: ✭ 152 (-3.8%)
Mutual labels:  baseline
Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (-37.34%)
Mutual labels:  vqa

Strong baseline for visual question answering

This is a re-implementation of Vahid Kazemi and Ali Elqursh's paper Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering in PyTorch.

The paper shows that with a relatively simple model, using only common building blocks in Deep Learning, you can get better accuracies than the majority of previously published work on the popular VQA v1 dataset.

This repository is intended to provide a straightforward implementation of the paper for other researchers to build on. The results closely match the reported results, as the majority of details should be exactly the same as the paper. (Thanks to the authors for answering my questions about some details!) This implementation seems to consistently converge to about 0.1% better results – there are two main implementation differences:

  • Instead of setting a limit on the maximum number of words per question and cutting off all words beyond this limit, this code uses per-example dynamic unrolling of the language model.
  • An issue with the official evaluation code makes some questions unanswerable. This code does not normalize machine-given answers, which avoids this problem. As the vast majority of questions are not affected by this issue, it's very unlikely that this will have any significant impact on accuracy.

A fully trained model (convergence shown below) is available for download.

Graph of convergence of implementation versus paper results

Note that the model in my other VQA repo performs better than the model implemented here.

Running the model

  • Clone this repository with:
git clone https://github.com/Cyanogenoid/pytorch-vqa --recursive
  • Set the paths to your downloaded questions, answers, and MS COCO images in config.py.
    • qa_path should contain the files OpenEnded_mscoco_train2014_questions.json, OpenEnded_mscoco_val2014_questions.json, mscoco_train2014_annotations.json, mscoco_val2014_annotations.json.
    • train_path, val_path, test_path should contain the train, validation, and test .jpg images respectively.
  • Pre-process images (93 GiB of free disk space required for f16 accuracy) with ResNet152 weights ported from Caffe and vocabularies for questions and answers with:
python preprocess-images.py
python preprocess-vocab.py
  • Train the model in model.py with:
python train.py

This will alternate between one epoch of training on the train split and one epoch of validation on the validation split while printing the current training progress to stdout and saving logs in the logs directory. The logs contain the name of the model, training statistics, contents of config.py, model weights, evaluation information (per-question answer and accuracy), and question and answer vocabularies.

  • During training (which takes a while), plot the training progress with:
python view-log.py <path to .pth log>

Python 3 dependencies (tested on Python 3.6.2)

  • torch
  • torchvision
  • h5py
  • tqdm
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].