Bottom Up AttentionBottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: â 989 (+1839.22%)
Vizwiz Vqa PytorchPyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Stars: â 33 (-35.29%)
Bottom Up Attention VqaAn efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Stars: â 667 (+1207.84%)
Vqa.pytorchVisual Question Answering in Pytorch
Stars: â 602 (+1080.39%)
MmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Stars: â 4,713 (+9141.18%)
Mac NetworkImplementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: â 444 (+770.59%)
Awesome VqaVisual Q&A reading list
Stars: â 403 (+690.2%)
OscarOscar and VinVL
Stars: â 396 (+676.47%)
Tbd NetsPyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: â 345 (+576.47%)
Awesome Visual Question AnsweringA curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Stars: â 295 (+478.43%)
Nscl Pytorch ReleasePyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Stars: â 276 (+441.18%)
MICCAI21 MMQMultiple Meta-model Quantifying for Medical Visual Question Answering
Stars: â 16 (-68.63%)
bottom-up-featuresBottom-up features extractor implemented in PyTorch.
Stars: â 62 (+21.57%)
rositaROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Stars: â 36 (-29.41%)
vqa-softAccompanying code for "A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models" CVPR 2017 VQA workshop paper.
Stars: â 14 (-72.55%)
FigureQA-baselineTensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: â 28 (-45.1%)
DVQA datasetDVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: â 20 (-60.78%)
just-ask[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: â 57 (+11.76%)
AoA-pytorchA Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: â 33 (-35.29%)
probnmn-clevrCode for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Stars: â 63 (+23.53%)
iMIXA framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: â 21 (-58.82%)
Transformer-MM-Explainability[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: â 484 (+849.02%)
mmgnn textvqaA Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Stars: â 41 (-19.61%)
hcrn-videoqaImplementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Stars: â 111 (+117.65%)
cfvqa[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: â 96 (+88.24%)
ZS-F-VQACode and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: â 51 (+0%)
VideoNavQAAn alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: â 22 (-56.86%)
neuro-symbolic-ai-socNeuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: â 41 (-19.61%)
self critical vqaCode for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: â 39 (-23.53%)
OpenvqaA lightweight, scalable, and general framework for visual question answering research
Stars: â 198 (+288.24%)
Clipbert[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Stars: â 168 (+229.41%)
Pytorch VqaStrong baseline for visual question answering
Stars: â 158 (+209.8%)
Vqa regatResearch Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: â 129 (+152.94%)
Papers读è¿çCVæ¹åçäžäºè®ºæïŒåŸåçææåã匱çç£åå²ç
Stars: â 99 (+94.12%)
Vqa TensorflowTensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: â 98 (+92.16%)
MullowbivqaHadamard Product for Low-rank Bilinear Pooling
Stars: â 57 (+11.76%)
VqaCloudCV Visual Question Answering Demo
Stars: â 57 (+11.76%)