Alternatives and detailed information of mmgnn_textvqa

likenneth / mmgnn_textvqa

Licence: other

A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to mmgnn textvqa

Papers

读过的CV方向的一些论文，图像生成文字、弱监督分割等

Stars: ✭ 99 (+141.46%)

Mutual labels: vqa

VideoNavQA

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Stars: ✭ 22 (-46.34%)

Mutual labels: vqa

awesome-efficient-gnn

Code and resources on scalable and efficient Graph Neural Networks

Stars: ✭ 498 (+1114.63%)

Mutual labels: gnn

Vqa Mfb

Stars: ✭ 153 (+273.17%)

Mutual labels: vqa

self critical vqa

Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''

Stars: ✭ 39 (-4.88%)

Mutual labels: vqa

ZS-F-VQA

Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]

Stars: ✭ 51 (+24.39%)

Mutual labels: vqa

Mullowbivqa

Hadamard Product for Low-rank Bilinear Pooling

Stars: ✭ 57 (+39.02%)

Mutual labels: vqa

Causing

Causing: CAUsal INterpretation using Graphs

Stars: ✭ 47 (+14.63%)

Mutual labels: gnn

neuro-symbolic-ai-soc

Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch

Stars: ✭ 41 (+0%)

Mutual labels: vqa

hcrn-videoqa

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

Stars: ✭ 111 (+170.73%)

Mutual labels: vqa

Pytorch Vqa

Strong baseline for visual question answering

Stars: ✭ 158 (+285.37%)

Mutual labels: vqa

Openvqa

A lightweight, scalable, and general framework for visual question answering research

Stars: ✭ 198 (+382.93%)

Mutual labels: vqa

3DInfomax

Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Stars: ✭ 107 (+160.98%)

Mutual labels: gnn

Vqa regat

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Stars: ✭ 129 (+214.63%)

Mutual labels: vqa

spatio-temporal-brain

A Deep Graph Neural Network Architecture for Modelling Spatio-temporal Dynamics in rs-fMRI Data

Stars: ✭ 22 (-46.34%)

Mutual labels: gnn

Vqa Tensorflow

Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

Stars: ✭ 98 (+139.02%)

Mutual labels: vqa

GraphMix

Code for reproducing results in GraphMix paper

Stars: ✭ 64 (+56.1%)

Mutual labels: gnn

ncem

Learning cell communication from spatial graphs of cells

Stars: ✭ 77 (+87.8%)

Mutual labels: gnn

Awesome-Federated-Learning-on-Graph-and-GNN-papers

Federated learning on graph, especially on graph neural networks (GNNs), knowledge graph, and private GNN.

Stars: ✭ 206 (+402.44%)

Mutual labels: gnn

cfvqa

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Stars: ✭ 96 (+134.15%)

Mutual labels: vqa

View All Similar Projects ➔

Multi-Modal GNN for TextVQA

This project provides codes to reproduce the results of Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text on TextVQA dataset
We are grateful to MMF (or Pythia), an excellent VQA codebase provided by Facebook, on which our codes are developed
We achieved 32.46% accuracy (ensemble) on test set of TextVQA

Requirements

Pytorch 1.0.1 post
We have performed experiments on Maxwell Titan X GPU, which has 12GB of GPU memory
See requirements.txt for the required python packages and run to install them

Let's begin from cloning this repository

$ git clone https://github.com/ricolike/mmgnn-textvqa.git
$ cd mmgnn-textvqa
$ pip install -r requirements.txt

Data Setup

cached data: To boost data loading speed under limited memory size (64G) and to speed up calculation, we cached intermediate dataloader results in storage. Download data (around 54G, and around 120G unzipped), and modify line 11 (fast_dir) in config to the absolute path where you save them
other files: Download other needed files (vocabulary, OCRs, some parameters of backbone) here (less than 1G), and make a soft link named data under repo root towards where you saved them

Training

Create a new model folder under ensemble, say foo, and then copy our config into it

$ mkdir -p ensemble/foo
$ cp ./configs/vqa/textvqa/s_mmgnn.yml ./ensemble/foo

Start training, and parameters will be saved in ensemble/foo

$ python tools/run.py --tasks vqa --datasets textvqa --model s_mmgnn --config ensemble/foo/s_mmgnn.yml -dev cuda:0 --run_type train`

First-run of this repo will automatically download glove in pythia/.vector_cache, let's be patient. If we made it, we will find a s_mmgnnbar_final.pth in the model folder ensemble/foo

Inference

If you want to skip training procedure, a trained model is provided on which we can directly do inference
Start inference by running the following command. And if you made it, you will find three new files generated under the model folder, two ends with _evailai.p are ready to be submitted to evalai to check the results

$ python tools/run.py --tasks vqa --datasets textvqa --model s_mmgnn --config ensemble/bar/s_mmgnn.yml --resume_file <path_to_pth> -dev cuda:0 --run_type all_in_one

Bibtex

@inproceedings{gao2020multi,
  title={Multi-modal graph neural network for joint reasoning on vision and scene text},
  author={Gao, Difei and Li, Ke and Wang, Ruiping and Shan, Shiguang and Chen, Xilin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={12746--12756},
  year={2020}
}

An attention visualization

Question: "What is the name of the bread sold at the place?"
Answer: "Panera"
(where white box is the answer predicted, green boxes are OCRs Panera attends to, and red boxes are visual ROIs Panera attends to; box weight indicating attention strength)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

likenneth / mmgnn_textvqa

Programming Languages

Labels

Projects that are alternatives of or similar to mmgnn textvqa

Multi-Modal GNN for TextVQA

Requirements

Data Setup

Training

Inference

Bibtex

An attention visualization