All Projects → iMIX → Similar Projects or Alternatives

89 Open source projects that are alternatives of or similar to iMIX

[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Stars: ✭ 57 (+171.43%)

Mutual labels: vqa, vision-and-language

slp

Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning

Stars: ✭ 17 (-19.05%)

Mutual labels: multimodal, multimodal-deep-learning

rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Stars: ✭ 36 (+71.43%)

Mutual labels: vqa, vision-and-language

Mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Stars: ✭ 4,713 (+22342.86%)

Mutual labels: vqa, multimodal

VideoNavQA

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Stars: ✭ 22 (+4.76%)

Mutual labels: vqa, multimodal

Openvqa

A lightweight, scalable, and general framework for visual question answering research

Stars: ✭ 198 (+842.86%)

Mutual labels: vqa

MultiGraphGAN

MultiGraphGAN for predicting multiple target graphs from a source graph using geometric deep learning.

Stars: ✭ 16 (-23.81%)

Mutual labels: multimodal-deep-learning

Vqa regat

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Stars: ✭ 129 (+514.29%)

Mutual labels: vqa

Vqa

CloudCV Visual Question Answering Demo

Stars: ✭ 57 (+171.43%)

Mutual labels: vqa

mix-stage

Official Repository for the paper Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach published in ECCV 2020 (https://arxiv.org/abs/2007.12553)

Stars: ✭ 22 (+4.76%)

Mutual labels: multimodal

hcrn-videoqa

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

Stars: ✭ 111 (+428.57%)

Mutual labels: vqa

Visual Question Answering

📷 ❓ Visual Question Answering Demo and Algorithmia API

Stars: ✭ 18 (-14.29%)

Mutual labels: vqa

tsflex

Flexible time series feature extraction & processing

Stars: ✭ 252 (+1100%)

Mutual labels: multimodal

pytorch violet

A PyTorch implementation of VIOLET

Stars: ✭ 119 (+466.67%)

Mutual labels: vision-and-language

Pytorch Vqa

Strong baseline for visual question answering

Stars: ✭ 158 (+652.38%)

Mutual labels: vqa

gakg

GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.

Stars: ✭ 21 (+0%)

Mutual labels: multimodal

Vqa Tensorflow

Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

Stars: ✭ 98 (+366.67%)

Mutual labels: vqa

MinkLocMultimodal

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Stars: ✭ 65 (+209.52%)

Mutual labels: multimodal

Bottom Up Attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Stars: ✭ 989 (+4609.52%)

Mutual labels: vqa

wikiHow paper list

A paper list of research conducted based on wikiHow

Stars: ✭ 25 (+19.05%)

Mutual labels: vision-and-language

Vqa.pytorch

Visual Question Answering in Pytorch

Stars: ✭ 602 (+2766.67%)

Mutual labels: vqa

Kaleido-BERT

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain.

Stars: ✭ 252 (+1100%)

Mutual labels: multimodal

Mac Network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Stars: ✭ 444 (+2014.29%)

Mutual labels: vqa

Oscar

Oscar and VinVL

Stars: ✭ 396 (+1785.71%)

Mutual labels: vqa

calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Stars: ✭ 105 (+400%)

Mutual labels: vision-and-language

lang2seg

Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019

Stars: ✭ 30 (+42.86%)

Mutual labels: vision-and-language

Awesome Visual Question Answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Stars: ✭ 295 (+1304.76%)

Mutual labels: vqa

neuro-symbolic-ai-soc

Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch

Stars: ✭ 41 (+95.24%)

Mutual labels: vqa

Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Stars: ✭ 484 (+2204.76%)

Mutual labels: vqa

self critical vqa

Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''

Stars: ✭ 39 (+85.71%)

Mutual labels: vqa

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+5485.71%)

Mutual labels: multimodal

Clipbert

[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.

Stars: ✭ 168 (+700%)

Mutual labels: vqa

mmgnn textvqa

A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Stars: ✭ 41 (+95.24%)

Mutual labels: vqa

Vqa Mfb

Stars: ✭ 153 (+628.57%)

Mutual labels: vqa

lipnet

LipNet with gluon

Stars: ✭ 16 (-23.81%)

Mutual labels: multimodal

Papers

读过的CV方向的一些论文，图像生成文字、弱监督分割等

Stars: ✭ 99 (+371.43%)

Mutual labels: vqa

MSAF

Offical implementation of paper "MSAF: Multimodal Split Attention Fusion"

Stars: ✭ 47 (+123.81%)

Mutual labels: multimodal-deep-learning

Mullowbivqa

Hadamard Product for Low-rank Bilinear Pooling

Stars: ✭ 57 (+171.43%)

Mutual labels: vqa

TRAR-VQA

[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation

Stars: ✭ 49 (+133.33%)

Mutual labels: vision-and-language

Conditional Batch Norm

Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"

Stars: ✭ 51 (+142.86%)

Mutual labels: vqa

pytorch-multimodal sarcasm detection

It is the implementation of paper "Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model"

Stars: ✭ 3 (-85.71%)

Mutual labels: multimodal

Vizwiz Vqa Pytorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

Stars: ✭ 33 (+57.14%)

Mutual labels: vqa

CBP

Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"

Stars: ✭ 52 (+147.62%)

Mutual labels: vision-and-language

Bottom Up Attention Vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Stars: ✭ 667 (+3076.19%)

Mutual labels: vqa

cfvqa

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Stars: ✭ 96 (+357.14%)

Mutual labels: vqa

Social-IQ

[CVPR 2019 Oral] Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence

Stars: ✭ 37 (+76.19%)

Mutual labels: multimodal-deep-learning

Awesome Vqa

Visual Q&A reading list

Stars: ✭ 403 (+1819.05%)

Mutual labels: vqa

BBFN

This repository contains the implementation of the paper -- Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Stars: ✭ 42 (+100%)

Mutual labels: multimodal-deep-learning

Tbd Nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Stars: ✭ 345 (+1542.86%)

Mutual labels: vqa

attentive-modality-hopping-for-SER

TensorFlow implementation of "Attentive Modality Hopping for Speech Emotion Recognition," ICASSP-20

Stars: ✭ 25 (+19.05%)

Mutual labels: multimodal-deep-learning

pykale

Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem

Stars: ✭ 381 (+1714.29%)

Mutual labels: multimodal

Nscl Pytorch Release

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

Stars: ✭ 276 (+1214.29%)

Mutual labels: vqa

MICCAI21 MMQ

Multiple Meta-model Quantifying for Medical Visual Question Answering

Stars: ✭ 16 (-23.81%)

Mutual labels: vqa

X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Stars: ✭ 283 (+1247.62%)

Mutual labels: vision-and-language

ZS-F-VQA

Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]

Stars: ✭ 51 (+142.86%)

Mutual labels: vqa

bottom-up-features

Bottom-up features extractor implemented in PyTorch.

Stars: ✭ 62 (+195.24%)

Mutual labels: vqa

VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Stars: ✭ 41 (+95.24%)

Mutual labels: vision-and-language

multimodal-deep-learning-for-disaster-response

Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset

Stars: ✭ 43 (+104.76%)

Mutual labels: multimodal-deep-learning

mmd

This repository contains the Pytorch implementation for our SCAI (EMNLP-2018) submission "A Knowledge-Grounded Multimodal Search-Based Conversational Agent"

Stars: ✭ 28 (+33.33%)

Mutual labels: multimodal-deep-learning

circDeep

End-to-End learning framework for circular RNA classification from other long non-coding RNA using multimodal deep learning

Stars: ✭ 21 (+0%)

Mutual labels: multimodal-deep-learning

1-60 of 89 similar projects

›