All Projects → likenneth → mmgnn_textvqa

likenneth / mmgnn_textvqa

Licence: other
A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to mmgnn textvqa

Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (+141.46%)
Mutual labels:  vqa
VideoNavQA
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: ✭ 22 (-46.34%)
Mutual labels:  vqa
awesome-efficient-gnn
Code and resources on scalable and efficient Graph Neural Networks
Stars: ✭ 498 (+1114.63%)
Mutual labels:  gnn
Vqa Mfb
Stars: ✭ 153 (+273.17%)
Mutual labels:  vqa
self critical vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (-4.88%)
Mutual labels:  vqa
ZS-F-VQA
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (+24.39%)
Mutual labels:  vqa
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (+39.02%)
Mutual labels:  vqa
Causing
Causing: CAUsal INterpretation using Graphs
Stars: ✭ 47 (+14.63%)
Mutual labels:  gnn
neuro-symbolic-ai-soc
Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: ✭ 41 (+0%)
Mutual labels:  vqa
hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Stars: ✭ 111 (+170.73%)
Mutual labels:  vqa
Pytorch Vqa
Strong baseline for visual question answering
Stars: ✭ 158 (+285.37%)
Mutual labels:  vqa
Openvqa
A lightweight, scalable, and general framework for visual question answering research
Stars: ✭ 198 (+382.93%)
Mutual labels:  vqa
3DInfomax
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.
Stars: ✭ 107 (+160.98%)
Mutual labels:  gnn
Vqa regat
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (+214.63%)
Mutual labels:  vqa
spatio-temporal-brain
A Deep Graph Neural Network Architecture for Modelling Spatio-temporal Dynamics in rs-fMRI Data
Stars: ✭ 22 (-46.34%)
Mutual labels:  gnn
Vqa Tensorflow
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (+139.02%)
Mutual labels:  vqa
GraphMix
Code for reproducing results in GraphMix paper
Stars: ✭ 64 (+56.1%)
Mutual labels:  gnn
ncem
Learning cell communication from spatial graphs of cells
Stars: ✭ 77 (+87.8%)
Mutual labels:  gnn
Awesome-Federated-Learning-on-Graph-and-GNN-papers
Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.
Stars: ✭ 206 (+402.44%)
Mutual labels:  gnn
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+134.15%)
Mutual labels:  vqa

Multi-Modal GNN for TextVQA

LICENSE Python PyTorch

  1. This project provides codes to reproduce the results of Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text on TextVQA dataset
  2. We are grateful to MMF (or Pythia), an excellent VQA codebase provided by Facebook, on which our codes are developed
  3. We achieved 32.46% accuracy (ensemble) on test set of TextVQA

Requirements

  1. Pytorch 1.0.1 post
  2. We have performed experiments on Maxwell Titan X GPU, which has 12GB of GPU memory
  3. See requirements.txt for the required python packages and run to install them

Let's begin from cloning this repository

$ git clone https://github.com/ricolike/mmgnn-textvqa.git
$ cd mmgnn-textvqa
$ pip install -r requirements.txt

Data Setup

  1. cached data: To boost data loading speed under limited memory size (64G) and to speed up calculation, we cached intermediate dataloader results in storage. Download data (around 54G, and around 120G unzipped), and modify line 11 (fast_dir) in config to the absolute path where you save them
  2. other files: Download other needed files (vocabulary, OCRs, some parameters of backbone) here (less than 1G), and make a soft link named data under repo root towards where you saved them

Training

  1. Create a new model folder under ensemble, say foo, and then copy our config into it
$ mkdir -p ensemble/foo
$ cp ./configs/vqa/textvqa/s_mmgnn.yml ./ensemble/foo
  1. Start training, and parameters will be saved in ensemble/foo
$ python tools/run.py --tasks vqa --datasets textvqa --model s_mmgnn --config ensemble/foo/s_mmgnn.yml -dev cuda:0 --run_type train`
  1. First-run of this repo will automatically download glove in pythia/.vector_cache, let's be patient. If we made it, we will find a s_mmgnnbar_final.pth in the model folder ensemble/foo

Inference

  1. If you want to skip training procedure, a trained model is provided on which we can directly do inference
  2. Start inference by running the following command. And if you made it, you will find three new files generated under the model folder, two ends with _evailai.p are ready to be submitted to evalai to check the results
$ python tools/run.py --tasks vqa --datasets textvqa --model s_mmgnn --config ensemble/bar/s_mmgnn.yml --resume_file <path_to_pth> -dev cuda:0 --run_type all_in_one

Bibtex

@inproceedings{gao2020multi,
  title={Multi-modal graph neural network for joint reasoning on vision and scene text},
  author={Gao, Difei and Li, Ke and Wang, Ruiping and Shan, Shiguang and Chen, Xilin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12746--12756},
  year={2020}
}

An attention visualization


Question: "What is the name of the bread sold at the place?"
Answer: "Panera"
(where white box is the answer predicted, green boxes are OCRs Panera attends to, and red boxes are visual ROIs Panera attends to; box weight indicating attention strength)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].