All Projects → mesnico → RelationNetworks-CLEVR

mesnico / RelationNetworks-CLEVR

Licence: MIT license
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to RelationNetworks-CLEVR

TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-40.96%)
Mutual labels:  clevr, visual-question-answering
FigureQA-baseline
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: ✭ 28 (-66.27%)
Mutual labels:  relation-network, visual-question-answering
KVQA
Korean Visual Question Answering
Stars: ✭ 44 (-46.99%)
Mutual labels:  visual-question-answering
thinkorm
A flexible, lightweight and powerful Object-Relational Mapper for Node.js. Support TypeScript!!
Stars: ✭ 33 (-60.24%)
Mutual labels:  relationships
probnmn-clevr
Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Stars: ✭ 63 (-24.1%)
Mutual labels:  clevr
convolutional-vqa
No description or website provided.
Stars: ✭ 39 (-53.01%)
Mutual labels:  visual-question-answering
heurist
Core development repository. gitHub: Vsn 6 (2020 - ), Vsn 5 (2018 - 2020), Vsn 4 (2014-2017). Sourceforge: Vsn 3 (2009-2013), Vsn 1 & 2 (2005-2009)
Stars: ✭ 39 (-53.01%)
Mutual labels:  relationships
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-60.24%)
Mutual labels:  visual-question-answering
neuro-symbolic-ai-soc
Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: ✭ 41 (-50.6%)
Mutual labels:  clevr
laravel-nova-nested-form
This package allows you to include your nested relationships' forms into a parent form.
Stars: ✭ 225 (+171.08%)
Mutual labels:  relationships
multiple-objects-gan
Implementation for "Generating Multiple Objects at Spatially Distinct Locations" (ICLR 2019)
Stars: ✭ 111 (+33.73%)
Mutual labels:  clevr
Relation-Network
Tensorflow implementation of Relation Network (bAbI dataset)
Stars: ✭ 32 (-61.45%)
Mutual labels:  relation-network
lc-spring-data-r2dbc
An extension of spring-data-r2dbc to provide features such as relationships, joins, cascading save/delete, lazy loading, sequence, schema generation, composite id
Stars: ✭ 30 (-63.86%)
Mutual labels:  relationships
bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (-25.3%)
Mutual labels:  visual-question-answering
eloquent-has-by-non-dependent-subquery
Convert has() and whereHas() constraints to non-dependent subqueries.
Stars: ✭ 70 (-15.66%)
Mutual labels:  relationships
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (-31.33%)
Mutual labels:  visual-question-answering
TVR
Transformation Driven Visual Reasoning - CVPR 2021
Stars: ✭ 24 (-71.08%)
Mutual labels:  clevr
RMN
Relation Memory Network
Stars: ✭ 17 (-79.52%)
Mutual labels:  relation-network
self critical vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (-53.01%)
Mutual labels:  visual-question-answering
eloquent-has-by-join
Convert has() and whereHas() constraints to join() ones for single-result relations.
Stars: ✭ 21 (-74.7%)
Mutual labels:  relationships

Relation Networks CLEVR

A pytorch implementation for A simple neural network module for relational reasoning https://arxiv.org/abs/1706.01427, working on the CLEVR dataset.

This code tries to reproduce results obtained by DeepMind team, both for the From Pixels and State Descriptions versions they described. Since the paper does not expose all the network details, there could be variations to respect the original results.

The model can also be trained with a slightly modified version of RN, called IR, that enables relational features extraction in order to perform Relational Content Based Image Retrieval (R-CBIR).

We released pretrained models for both Original and Image Retrieval architectures (below, detailed instructions on how to use them).

Accuracy

Accuracy values measured on the test set:

Model
From Pixels 93.6%
State Descriptions 97.9%

Get ready

  1. Download and extract CLEVR_v1.0 dataset: http://cs.stanford.edu/people/jcjohns/clevr/
  2. Clone this repository and move into it:
git clone https://github.com/mesnico/RelationNetworks-CLEVR
cd RelationNetworks-CLEVR
  1. Setup a virtual environment (optional, but recommended)
mkdir env
virtualenv -p /usr/bin/python3 env
source env/bin/activate
  1. Install requirements:
pip3 install -r requirements.txt

Train

The training code can be run both using Docker or standard python installation with pytorch. If Docker is used, an image is built with all needed dependencies and it can be easily run inside a Docker container.

State-descriptions version

Move to the cloned directory and issue the command:

python3 train.py --clevr-dir path/to/CLEVR_v1.0/ --model 'original-sd' | tee logfile.log

We reached an accuracy around 98% over the test set. Using these parameters, training is performed by using an exponential increase policy for the learning rate (slow start method). Without this policy, our training stopped at around 70% accuracy. Our training curve measured on the test set:

accuracy

From-pixels version

Move to the cloned directory and issue the command:

python3 train.py --clevr-dir path/to/CLEVR_v1.0/ --model 'original-fp' | tee logfile.log

We used the same exponential increase policy we employed for the State Descriptions version. We were able to reach around 93% accuracy over the test set:

accuracy

Configuration file

We prepared a json-coded configuration file from which model hyperparameters can be tuned. The option --config specifies a json configuration file, while the option --model loads a specific hyperparameters configuration defined in the file. By default, the configuration file is config.json and the default model is original-fp.

Training plots

Once training ends, some plots (invalid answers, training loss, test loss, test accuracy) can be generated using the plot.py script:

python3 plot.py -i -trl -tsl -a logfile.log

These plots are also saved inside img/ folder.

To explore a bunch of other possible arguments useful to customize training, issue the command:

$ python3 train.py --help

Test

It is possible to run a test session even after training, by loading a specific checkpoint from the trained network collected at a certain epoch. This is possible by specifying the option --test:

python3 train.py --clevr-dir path/to/CLEVR_v1.0/ --model 'original-fp' --resume RN_epoch_xxx.pth --test

IMPORTANT: If you receive an out of memory error from CUDA due to the fact that you have not enough V-RAM for testing, just lower the test batch-size to 64 or 32 by using the option --test-batch-size 32

Using pre-trained models

We released pre-trained models for Original and Image-Retrieval architectures, for the challenging from-pixels version.

Test on original pre-trained

Epoch 493

python3 train.py --clevr-dir path/to/CLEVR_v1.0/ --model 'original-fp' --resume pretrained_models/original_fp_epoch_493.pth --test

Test on IR pre-trained

Epoch 312

python3 train.py --clevr-dir path/to/CLEVR_v1.0/ --model 'ir-fp' --resume pretrained_models/ir_fp_epoch_312.pth --test

Confusion plot

Once test has been performed at least once (note that a test session can be explicitly run but it is also always run automatically after every train epoch), some insights are saved into test_results and a confusion plot can be generated from them:

python3 confusionplot.py test_results/test.pickle

confusion

This is useful to discover network weaknesses and possibly solve them. This plot is also saved inside img/ folder.

Implementation details

  • Questions and answers dictionaries are built from data in training set, so the model will not work with words never seen before.
  • All the words in the dataset are treated in a case-insensitive manner, since we don't want the model to learn case biases.
  • For network settings, see sections 4 and B from the original paper https://arxiv.org/abs/1706.01427.

Acknowledgements

Special thanks to https://github.com/aelnouby and https://github.com/rosinality for their great support. Following, their Relation Network repositories working on CLEVR:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].