All Projects → rosita → Similar Projects or Alternatives

57 Open source projects that are alternatives of or similar to rosita

just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+58.33%)
Mutual labels:  vqa, vision-and-language, pre-training
iMIX
A framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (-41.67%)
Mutual labels:  vqa, vision-and-language
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+230.56%)
vqa-soft
Accompanying code for "A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models" CVPR 2017 VQA workshop paper.
Stars: ✭ 14 (-61.11%)
Mutual labels:  vqa
FigureQA-baseline
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: ✭ 28 (-22.22%)
Mutual labels:  vqa
DVQA dataset
DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: ✭ 20 (-44.44%)
Mutual labels:  vqa
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (-5.56%)
Mutual labels:  vision-and-language
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-8.33%)
Mutual labels:  vqa
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (+58.33%)
Mutual labels:  vision-and-language
synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
Stars: ✭ 14 (-61.11%)
Mutual labels:  vision-and-language
VarCLR
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
Stars: ✭ 30 (-16.67%)
Mutual labels:  pre-training
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+122.22%)
Mutual labels:  vision-and-language
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+133.33%)
Mutual labels:  vision-and-language
probnmn-clevr
Code for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Stars: ✭ 63 (+75%)
Mutual labels:  vqa
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (+13.89%)
Mutual labels:  vision-and-language
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Stars: ✭ 52 (+44.44%)
Mutual labels:  vision-and-language
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (-30.56%)
Mutual labels:  vision-and-language
awesome-graph-self-supervised-learning
Awesome Graph Self-Supervised Learning
Stars: ✭ 805 (+2136.11%)
Mutual labels:  pre-training
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (+36.11%)
Mutual labels:  vision-and-language
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Stars: ✭ 105 (+191.67%)
Mutual labels:  vision-and-language
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Stars: ✭ 283 (+686.11%)
Mutual labels:  vision-and-language
Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+1244.44%)
Mutual labels:  vqa
mmgnn textvqa
A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Stars: ✭ 41 (+13.89%)
Mutual labels:  vqa
VQ-APC
Vector Quantized Autoregressive Predictive Coding (VQ-APC)
Stars: ✭ 34 (-5.56%)
Mutual labels:  pre-training
hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Stars: ✭ 111 (+208.33%)
Mutual labels:  vqa
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+166.67%)
Mutual labels:  vqa
Kaleido-BERT
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
Stars: ✭ 252 (+600%)
Mutual labels:  pre-training
lang2seg
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019
Stars: ✭ 30 (-16.67%)
Mutual labels:  vision-and-language
ZS-F-VQA
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (+41.67%)
Mutual labels:  vqa
VideoNavQA
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: ✭ 22 (-38.89%)
Mutual labels:  vqa
neuro-symbolic-ai-soc
Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: ✭ 41 (+13.89%)
Mutual labels:  vqa
self critical vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (+8.33%)
Mutual labels:  vqa
Openvqa
A lightweight, scalable, and general framework for visual question answering research
Stars: ✭ 198 (+450%)
Mutual labels:  vqa
Clipbert
[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Stars: ✭ 168 (+366.67%)
Mutual labels:  vqa
Pytorch Vqa
Strong baseline for visual question answering
Stars: ✭ 158 (+338.89%)
Mutual labels:  vqa
Vqa Mfb
Stars: ✭ 153 (+325%)
Mutual labels:  vqa
Vqa regat
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (+258.33%)
Mutual labels:  vqa
Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (+175%)
Mutual labels:  vqa
Vqa Tensorflow
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (+172.22%)
Mutual labels:  vqa
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (+58.33%)
Mutual labels:  vqa
Vqa
CloudCV Visual Question Answering Demo
Stars: ✭ 57 (+58.33%)
Mutual labels:  vqa
Conditional Batch Norm
Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
Stars: ✭ 51 (+41.67%)
Mutual labels:  vqa
Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+2647.22%)
Mutual labels:  vqa
Vizwiz Vqa Pytorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Stars: ✭ 33 (-8.33%)
Mutual labels:  vqa
Visual Question Answering
📷 ❓ Visual Question Answering Demo and Algorithmia API
Stars: ✭ 18 (-50%)
Mutual labels:  vqa
Bottom Up Attention Vqa
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Stars: ✭ 667 (+1752.78%)
Mutual labels:  vqa
Vqa.pytorch
Visual Question Answering in Pytorch
Stars: ✭ 602 (+1572.22%)
Mutual labels:  vqa
Mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Stars: ✭ 4,713 (+12991.67%)
Mutual labels:  vqa
Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (+1133.33%)
Mutual labels:  vqa
Awesome Vqa
Visual Q&A reading list
Stars: ✭ 403 (+1019.44%)
Mutual labels:  vqa
Oscar
Oscar and VinVL
Stars: ✭ 396 (+1000%)
Mutual labels:  vqa
Tbd Nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: ✭ 345 (+858.33%)
Mutual labels:  vqa
Awesome Visual Question Answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Stars: ✭ 295 (+719.44%)
Mutual labels:  vqa
Nscl Pytorch Release
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Stars: ✭ 276 (+666.67%)
Mutual labels:  vqa
MICCAI21 MMQ
Multiple Meta-model Quantifying for Medical Visual Question Answering
Stars: ✭ 16 (-55.56%)
Mutual labels:  vqa
bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (+72.22%)
Mutual labels:  vqa
SIGIR2021 Conure
One Person, One Model, One World: Learning Continual User Representation without Forgetting
Stars: ✭ 23 (-36.11%)
Mutual labels:  pre-training
1-57 of 57 similar projects