All Projects → DenisDsh → Vizwiz Vqa Pytorch

DenisDsh / Vizwiz Vqa Pytorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

Projects that are alternatives of or similar to Vizwiz Vqa Pytorch

Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+2896.97%)
Mutual labels:  vqa, jupyter-notebook
Tbd Nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: ✭ 345 (+945.45%)
Mutual labels:  vqa, jupyter-notebook
Visual Question Answering
📷 ❓ Visual Question Answering Demo and Algorithmia API
Stars: ✭ 18 (-45.45%)
Mutual labels:  vqa, jupyter-notebook
Omx
Open Matrix (OMX)
Stars: ✭ 32 (-3.03%)
Mutual labels:  jupyter-notebook
Lectures2020
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Pytorch Softplus Normalization Uncertainty Estimation Bayesian Cnn
PyTorch code for Paper "Uncertainty Estimations by Softplus normalization in Bayesian Convolutional Neural Networks with Variational Inference"
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Sanet Keras
Implement SANet for crowd counting in Keras.
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Pydata Amsterdam 2016
Machine Learning with Scikit-Learn (material for pydata Amsterdam 2016)
Stars: ✭ 32 (-3.03%)
Mutual labels:  jupyter-notebook
Aws Deepracer Workshops
DeepRacer workshop content
Stars: ✭ 968 (+2833.33%)
Mutual labels:  jupyter-notebook
Numerical methods youtube
Stars: ✭ 32 (-3.03%)
Mutual labels:  jupyter-notebook
Voice emotion
Detecting emotion in voices
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Gaze Estimation
A deep learning based gaze estimation framework implemented with PyTorch
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Object detection tools
Object detection useful tools for TensorFlow Object Detection API
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Machinelearningdeeplearning
李宏毅2021机器学习深度学习笔记PPT作业
Stars: ✭ 32 (-3.03%)
Mutual labels:  jupyter-notebook
Natural Language Processing
Resources for "Natural Language Processing" Coursera course.
Stars: ✭ 969 (+2836.36%)
Mutual labels:  jupyter-notebook
Madmom tutorials
Tutorials for the madmom package.
Stars: ✭ 32 (-3.03%)
Mutual labels:  jupyter-notebook
Geemap
A Python package for interactive mapping with Google Earth Engine, ipyleaflet, and folium
Stars: ✭ 959 (+2806.06%)
Mutual labels:  jupyter-notebook
Attentive Neural Processes
implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Pm Pyro
PyMC3-like Interface for Pyro
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook
Simple Ssd For Beginners
This repository contains easy SSD(Single Shot MultiBox Detector) implemented with Pytorch and is easy to read and learn
Stars: ✭ 33 (+0%)
Mutual labels:  jupyter-notebook

VizWiz Challenge: Visual Question Answering Implementation in PyTorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People. The code can be easily adapted for training on VQA 1.0/2.0 or any other dataset.

The implemented architecture is a variant of the VQA model described in Kazemi et al. (2017). Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering. Visual feature are extracted using a pretrained (on ImageNet) ResNet-152. Input Questions are tokenized, embedded and encoded with an LSTM. Image features and encoded questions are combined and used to compute multiple attention maps over image features. The attended image features and the encoded questions are concatenated and finally fed to a 2-layer classifier that outputs probabilities over the answers (classes).

More information about the attention module can be found in Yang et al. (2015). Stacked Attention Networks for Image Question Answering.

In order to consider all 10 answers given by the annotators we exploit a Soft Cross-Entropy loss : a weighted average of the negative log-probabilities of each unique ground-truth answer. This loss function better aligns to the VQA evaluation metric used to evaluate the challenge submissions.

Soft cross-entropy loss

Experimental Results

method accuracy
VizWiz Paper 0.475
Ours 0.516

Training and Evaluation

  • Install requirements:
conda create --name viz_env python=3.6
source activate viz_env
pip install -r requirements.txt
wget https://ivc.ischool.utexas.edu/VizWiz/data/VizWiz_data_ver1.tar.gz
tar -xzf VizWiz_data_ver1.tar.gz

After unpacking the dataset, the Image folder will contain files with prefix ._VizWiz. Those files should be removed before extracting the image features:

rm ._*
  • Set the paths to the downloaded data in the yaml configuration file config/default.yaml.

  • Extract features from input images (~26GB) The script will extract two types of features from the images:

    • No Attention: 2048 feature vectors consisting of the activations of the penultimate layer of pre-trained ResNet-152.
    • Attention: 2048x14x14 feature tensors consisting of the activations of the last pooling layer of the ResNet-152.

    Our model will use only the "Attention" features. However it is possible to extend the implementation designing new models that do not use attention mechanisms.

python ./preprocessing/image_features_extraction.py
  • Construct dictionaries that will be used during training to encode words and answers:
python ./preprocessing/create_vocabs.py
  • Start training:
python train.py

During training, the models with the highest validation accuracy and with the lowest validation loss are saved. The path of the log directory is specified in the yaml configuration file config/default.yaml.

  • Construct prediction file for the test split:
python predict.py

Acknowledgment

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].