OpenvqaA lightweight, scalable, and general framework for visual question answering research
Clipbert[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Pytorch VqaStrong baseline for visual question answering
Vqa regatResearch Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Papers读过的CV方向的一些论文,图像生成文字、弱监督分割等
Vqa TensorflowTensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
MullowbivqaHadamard Product for Low-rank Bilinear Pooling
VqaCloudCV Visual Question Answering Demo
Conditional Batch NormPytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
Bottom Up AttentionBottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Vizwiz Vqa PytorchPyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
MmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Mac NetworkImplementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Tbd NetsPyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Awesome Visual Question AnsweringA curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
MICCAI21 MMQMultiple Meta-model Quantifying for Medical Visual Question Answering
rositaROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
vqa-softAccompanying code for "A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models" CVPR 2017 VQA workshop paper.
FigureQA-baselineTensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
DVQA datasetDVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
just-ask[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
AoA-pytorchA Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
probnmn-clevrCode for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
iMIXA framework for Multimodal Intelligence research from Inspur HSSLAB.
Transformer-MM-Explainability[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
mmgnn textvqaA Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
hcrn-videoqaImplementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
cfvqa[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
ZS-F-VQACode and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
VideoNavQAAn alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
self critical vqaCode for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''