Bottom Up AttentionBottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+149.75%)
just-ask[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (-85.61%)
BUTD modelA pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-92.93%)
hcrn-videoqaImplementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)
Stars: ✭ 111 (-71.97%)
Show and TellShow and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (-81.31%)
image-captioning-DLCTOfficial pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
Stars: ✭ 134 (-66.16%)
udacity-cvnd-projectsMy solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-90.91%)
captioning chainerA fast implementation of Neural Image Caption by Chainer
Stars: ✭ 17 (-95.71%)
AoA-pytorchA Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-91.67%)
neuro-symbolic-ai-socNeuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch
Stars: ✭ 41 (-89.65%)
Pytorch VqaStrong baseline for visual question answering
Stars: ✭ 158 (-60.1%)
Image-CaptioiningThe objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…
Stars: ✭ 20 (-94.95%)
vqa-softAccompanying code for "A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models" CVPR 2017 VQA workshop paper.
Stars: ✭ 14 (-96.46%)
LaBERTA length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (-87.37%)
Transformer-MM-Explainability[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Stars: ✭ 484 (+22.22%)
Machine-LearningThe projects I do in Machine Learning with PyTorch, keras, Tensorflow, scikit learn and Python.
Stars: ✭ 54 (-86.36%)
AdaptiveattentionImplementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Stars: ✭ 303 (-23.48%)
ZS-F-VQACode and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (-87.12%)
RSTNetRSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (-82.07%)
OpenvqaA lightweight, scalable, and general framework for visual question answering research
Stars: ✭ 198 (-50%)
MICCAI21 MMQMultiple Meta-model Quantifying for Medical Visual Question Answering
Stars: ✭ 16 (-95.96%)
Show-Attend-and-TellA PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Stars: ✭ 58 (-85.35%)
Vqa regatResearch Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (-67.42%)
Vqa TensorflowTensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (-75.25%)
gramtionTwitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-94.7%)
CS231nMy solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (-92.42%)
Nscl Pytorch ReleasePyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
Stars: ✭ 276 (-30.3%)
probnmn-clevrCode for ICML 2019 paper "Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering" [long-oral]
Stars: ✭ 63 (-84.09%)
FigureQA-baselineTensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Stars: ✭ 28 (-92.93%)
iMIXA framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (-94.7%)
ScanPyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Stars: ✭ 306 (-22.73%)
UdacityThis repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (-85.61%)
DVQA datasetDVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: ✭ 20 (-94.95%)
mmgnn textvqaA Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Stars: ✭ 41 (-89.65%)
im2pTensorflow implement of paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs
Stars: ✭ 43 (-89.14%)
catrImage Captioning Using Transformer
Stars: ✭ 206 (-47.98%)
Image-CaptionUsing LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-90.91%)
CS231nCS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-87.88%)
Virtex[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
Stars: ✭ 323 (-18.43%)
cfvqa[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (-75.76%)
VideoNavQAAn alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: ✭ 22 (-94.44%)
stylenetA pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"
Stars: ✭ 58 (-85.35%)
self critical vqaCode for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Stars: ✭ 39 (-90.15%)
Awesome-CaptioningA curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Stars: ✭ 56 (-85.86%)
Clipbert[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Stars: ✭ 168 (-57.58%)
Awesome Visual Question AnsweringA curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
Stars: ✭ 295 (-25.51%)
Papers读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (-75%)
bottom-up-featuresBottom-up features extractor implemented in PyTorch.
Stars: ✭ 62 (-84.34%)
MullowbivqaHadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (-85.61%)
AdaptivePytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Stars: ✭ 97 (-75.51%)
Tbd NetsPyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Stars: ✭ 345 (-12.88%)
Cs231Complete Assignments for CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 317 (-19.95%)
Image CaptioningImage Captioning using InceptionV3 and beam search
Stars: ✭ 290 (-26.77%)
rositaROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Stars: ✭ 36 (-90.91%)