All Projects → ap229997 → Conditional Batch Norm

ap229997 / Conditional Batch Norm

Licence: mit
Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Conditional Batch Norm

just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+11.76%)
Mutual labels:  vqa
Awesome Visual Question Answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Stars: ✭ 295 (+478.43%)
Mutual labels:  vqa
Vqa.pytorch
Visual Question Answering in Pytorch
Stars: ✭ 602 (+1080.39%)
Mutual labels:  vqa
FigureQA-baseline
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: ✭ 28 (-45.1%)
Mutual labels:  vqa
MICCAI21 MMQ
Multiple Meta-model Quantifying for Medical Visual Question Answering
Stars: ✭ 16 (-68.63%)
Mutual labels:  vqa
Oscar
Oscar and VinVL
Stars: ✭ 396 (+676.47%)
Mutual labels:  vqa
probnmn-clevr
Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Stars: ✭ 63 (+23.53%)
Mutual labels:  vqa
Vizwiz Vqa Pytorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Stars: ✭ 33 (-35.29%)
Mutual labels:  vqa
Nscl Pytorch Release
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Stars: ✭ 276 (+441.18%)
Mutual labels:  vqa
Mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Stars: ✭ 4,713 (+9141.18%)
Mutual labels:  vqa
vqa-soft
Accompanying code for "A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models" CVPR 2017 VQA workshop paper.
Stars: ✭ 14 (-72.55%)
Mutual labels:  vqa
bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (+21.57%)
Mutual labels:  vqa
Awesome Vqa
Visual Q&A reading list
Stars: ✭ 403 (+690.2%)
Mutual labels:  vqa
DVQA dataset
DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: ✭ 20 (-60.78%)
Mutual labels:  vqa
Bottom Up Attention Vqa
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Stars: ✭ 667 (+1207.84%)
Mutual labels:  vqa
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-35.29%)
Mutual labels:  vqa
Tbd Nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: ✭ 345 (+576.47%)
Mutual labels:  vqa
Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+1839.22%)
Mutual labels:  vqa
Visual Question Answering
📷 ❓ Visual Question Answering Demo and Algorithmia API
Stars: ✭ 18 (-64.71%)
Mutual labels:  vqa
Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (+770.59%)
Mutual labels:  vqa

Conditional Batch Normalization

Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language" [Link]

Introduction

The authors present a novel approach to incorporate language information into extracting visual features by conditioning the Batch Normalization parameters on the language. They apply Conditional Batch Normalization (CBN) to a pre-trained ResNet and show that this significantly improves performance on visual question answering tasks.

Setup

This repository is compatible with python 2.

  • Follow instructions outlined on PyTorch Homepage for installing PyTorch (Python2).
  • The python packages required are nltk tqdm which can be installed using pip.

Data

To download the VQA dataset please use the script 'scripts/vqa_download.sh':

scripts/vqa_download.sh `pwd`/data

Process Data

Detailed instructions for processing data are provided by GuessWhatGame/vqa.

Create dictionary

To create the VQA dictionary, use the script preprocess_data/create_dico.py.

python preprocess_data/create_dictionary.py --data_dir data --year 2014 --dict_file dict.json

Create GLOVE dictionary

To create the GLOVE dictionary, download the original glove file and run the script preprocess_data/create_gloves.py.

wget http://nlp.stanford.edu/data/glove.42B.300d.zip -P data/
unzip data/glove.42B.300d.zip -d data/
python preprocess_data/create_gloves.py --data_dir data --glove_in data/glove.42B.300d.txt --glove_out data/glove_dict.pkl --year 2014

Train Model

To train the network, set the required parameters in config.json and run the script main.py.

python main.py --gpu gpu_id --data_dir data --img_dir images --config config.json --exp_dir exp --year 2014

Citation

If you find this code useful, please consider citing the original work by authors:

@inproceedings{de2017modulating,
author = {Harm de Vries and Florian Strub and J\'er\'emie Mary and Hugo Larochelle and Olivier Pietquin and Aaron C. Courville},
title = {Modulating early visual processing by language},
booktitle = {Advances in Neural Information Processing Systems 30},
year = {2017}
url = {https://arxiv.org/abs/1707.00683}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].