All Categories → No Category → vision-and-language

Top 16 vision-and-language open source projects

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

✭ 36

python vqa vision-and-language pre-training referring-expression-comprehension image-text-retrieval

[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

✭ 57

Jupyter Notebook python HTML vqa video-understanding weakly-supervised-learning multimodal-learning visual-question-answering question-generation vision-and-language videoqa pre-training video-question-answering

robo-vln

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

✭ 34

python deep-neural-networks computer-vision deep-learning robotics navigation transformers pytorch artificial-intelligence supervised-learning bert vision-and-language vision-and-language-navigation habitat-api habitat-sim

MIA

Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" （NeurIPS 2019）

✭ 57

python shell image-captioning mscoco-image-dataset vision-and-language image-representations

synse-zsl

Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'

✭ 14

Jupyter Notebook python computer-vision pytorch action-recognition pose zero-shot-learning vision-and-language skeleton-based-action-recognition generalized-zero-shot-learning pose-skeleton icip2021

clip playground

An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities

✭ 80

Jupyter Notebook deep-learning openai vision-and-language colab-notebook

stanford-cs231n-assignments-2020

This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).

✭ 84

Jupyter Notebook python deep-neural-networks computer-vision deep-learning neural-network image-processing recurrent-neural-networks stanford convolutional-neural-networks attention-mechanism cs231n captioning-images stanford-machine-learning cnns rnns cs231n-assignment vision-and-language cs231

iMIX

A framework for Multimodal Intelligence research from Inspur HSSLAB.

✭ 21

python framework vqa multimodal vision-and-language multimodal-deep-learning

VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

✭ 41

python shell nlp video vision srl captioning captioning-videos vision-and-language grounding video-language event-relations semantic-roles

CBP

Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"

✭ 52

python shell video-analysis vision-and-language video-grounding action-localization video-moment-retrieval

wikiHow paper list

A paper list of research conducted based on wikiHow

✭ 25

natural-language-processing natural-language-generation natural-language-understanding learning-resources wikihow vision-and-language

TRAR-VQA

[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation

✭ 49

python visualization pytorch transformer attention official multi-modal clevr visual-question-answering vision-and-language dynamic-network multi-modality multi-modal-learning multi-scale-features vqav2 iccv2021 local-and-global

calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

✭ 105

python Jupyter Notebook shell natural-language-processing computer-vision deep-learning robotics pytorch vision manipulation vision-and-language grounding vision-language

X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

✭ 283

python multimodality vision-and-language x-vlm

pytorch violet

A PyTorch implementation of VIOLET

✭ 119

python pytorch vision-and-language pre-training video-retrieval video-question-answering

lang2seg

Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019

✭ 30

python Jupyter Notebook shell c Cuda matlab computer-vision deep-learning pytorch object-detection object-segmentation vision-and-language cycle-consistency

1-16 of 16 vision-and-language projects