All Projects → iMIX → Similar Projects or Alternatives

89 Open source projects that are alternatives of or similar to iMIX

just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+171.43%)
Mutual labels:  vqa, vision-and-language
slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Stars: ✭ 17 (-19.05%)
rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Stars: ✭ 36 (+71.43%)
Mutual labels:  vqa, vision-and-language
Mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Stars: ✭ 4,713 (+22342.86%)
Mutual labels:  vqa, multimodal
VideoNavQA
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: ✭ 22 (+4.76%)
Mutual labels:  vqa, multimodal
Openvqa
A lightweight, scalable, and general framework for visual question answering research
Stars: ✭ 198 (+842.86%)
Mutual labels:  vqa
MultiGraphGAN
MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.
Stars: ✭ 16 (-23.81%)
Mutual labels:  multimodal-deep-learning
Vqa regat
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (+514.29%)
Mutual labels:  vqa
Vqa
CloudCV Visual Question Answering Demo
Stars: ✭ 57 (+171.43%)
Mutual labels:  vqa
mix-stage
Official Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)
Stars: ✭ 22 (+4.76%)
Mutual labels:  multimodal
hcrn-videoqa
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Stars: ✭ 111 (+428.57%)
Mutual labels:  vqa
Visual Question Answering
📷 ❓ Visual Question Answering Demo and Algorithmia API
Stars: ✭ 18 (-14.29%)
Mutual labels:  vqa
tsflex
Flexible time series feature extraction & processing
Stars: ✭ 252 (+1100%)
Mutual labels:  multimodal
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+466.67%)
Mutual labels:  vision-and-language
Pytorch Vqa
Strong baseline for visual question answering
Stars: ✭ 158 (+652.38%)
Mutual labels:  vqa
gakg
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.
Stars: ✭ 21 (+0%)
Mutual labels:  multimodal
Vqa Tensorflow
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (+366.67%)
Mutual labels:  vqa
MinkLocMultimodal
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Stars: ✭ 65 (+209.52%)
Mutual labels:  multimodal
Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+4609.52%)
Mutual labels:  vqa
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (+19.05%)
Mutual labels:  vision-and-language
Vqa.pytorch
Visual Question Answering in Pytorch
Stars: ✭ 602 (+2766.67%)
Mutual labels:  vqa
Kaleido-BERT
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.
Stars: ✭ 252 (+1100%)
Mutual labels:  multimodal
Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (+2014.29%)
Mutual labels:  vqa
Oscar
Oscar and VinVL
Stars: ✭ 396 (+1785.71%)
Mutual labels:  vqa
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Stars: ✭ 105 (+400%)
Mutual labels:  vision-and-language
lang2seg
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019
Stars: ✭ 30 (+42.86%)
Mutual labels:  vision-and-language
Awesome Visual Question Answering
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Stars: ✭ 295 (+1304.76%)
Mutual labels:  vqa
neuro-symbolic-ai-soc
Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: ✭ 41 (+95.24%)
Mutual labels:  vqa
Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+2204.76%)
Mutual labels:  vqa
self critical vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (+85.71%)
Mutual labels:  vqa
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+5485.71%)
Mutual labels:  multimodal
Clipbert
[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Stars: ✭ 168 (+700%)
Mutual labels:  vqa
mmgnn textvqa
A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Stars: ✭ 41 (+95.24%)
Mutual labels:  vqa
Vqa Mfb
Stars: ✭ 153 (+628.57%)
Mutual labels:  vqa
lipnet
LipNet with gluon
Stars: ✭ 16 (-23.81%)
Mutual labels:  multimodal
Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (+371.43%)
Mutual labels:  vqa
MSAF
Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"
Stars: ✭ 47 (+123.81%)
Mutual labels:  multimodal-deep-learning
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (+171.43%)
Mutual labels:  vqa
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (+133.33%)
Mutual labels:  vision-and-language
Conditional Batch Norm
Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"
Stars: ✭ 51 (+142.86%)
Mutual labels:  vqa
pytorch-multimodal sarcasm detection
It is the implementation of paper "Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model"
Stars: ✭ 3 (-85.71%)
Mutual labels:  multimodal
Vizwiz Vqa Pytorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Stars: ✭ 33 (+57.14%)
Mutual labels:  vqa
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Stars: ✭ 52 (+147.62%)
Mutual labels:  vision-and-language
Bottom Up Attention Vqa
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
Stars: ✭ 667 (+3076.19%)
Mutual labels:  vqa
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+357.14%)
Mutual labels:  vqa
Social-IQ
[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence
Stars: ✭ 37 (+76.19%)
Mutual labels:  multimodal-deep-learning
Awesome Vqa
Visual Q&A reading list
Stars: ✭ 403 (+1819.05%)
Mutual labels:  vqa
BBFN
This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis
Stars: ✭ 42 (+100%)
Mutual labels:  multimodal-deep-learning
Tbd Nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: ✭ 345 (+1542.86%)
Mutual labels:  vqa
attentive-modality-hopping-for-SER
TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20
Stars: ✭ 25 (+19.05%)
Mutual labels:  multimodal-deep-learning
pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem
Stars: ✭ 381 (+1714.29%)
Mutual labels:  multimodal
Nscl Pytorch Release
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Stars: ✭ 276 (+1214.29%)
Mutual labels:  vqa
MICCAI21 MMQ
Multiple Meta-model Quantifying for Medical Visual Question Answering
Stars: ✭ 16 (-23.81%)
Mutual labels:  vqa
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Stars: ✭ 283 (+1247.62%)
Mutual labels:  vision-and-language
ZS-F-VQA
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (+142.86%)
Mutual labels:  vqa
bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (+195.24%)
Mutual labels:  vqa
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (+95.24%)
Mutual labels:  vision-and-language
multimodal-deep-learning-for-disaster-response
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
Stars: ✭ 43 (+104.76%)
Mutual labels:  multimodal-deep-learning
mmd
This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"
Stars: ✭ 28 (+33.33%)
Mutual labels:  multimodal-deep-learning
circDeep
End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning
Stars: ✭ 21 (+0%)
Mutual labels:  multimodal-deep-learning
1-60 of 89 similar projects