Awesome Captioning:

A curated list of Visual Captioning and related area.

Survey Papers
Research Papers
- 2022
  - arxiv 2022
  - AAAI 2022
- 2021
  - NIPS 2021
  - EMNLP 2021
  - ICCV 2021
  - ACM MM 2021
  - Interspeech 2021
  - ACL 2021
  - IJCAI 2021
  - NAACL 2021
  - CVPR 2021
  - ICASSP 2021
  - AAAI 2021
- 2020
  - EMNLP 2020
  - NIPS 2020
  - ACM MM 2020
  - ECCV 2020
  - IJCAI 2020
  - ACL 2020
  - CVPR 2020
  - AAAI 2020
- 2019
  - NIPS 2019
  - ICCV 2019
  - ACL 2019
  - CVPR 2019
  - AAAI 2019
- 2018
- 2017
- 2016
  - CVPR 2016
  - TPAMI 2016
- 2015
Dataset
Popular Codebase
Reference and Acknowledgement

Survey Papers

2021

From Show to Tell: A Survey on Image Captioning. [paper]

Research Papers

2022

arxiv 2022

Image Captioning

Compact Bidirectional Transformer for Image Captioning. [paper] [code]
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer. [paper]
I-Tuning: Tuning Language Models with Image for Caption Generation. [paper]
CaMEL: Mean Teacher Learning for Image Captioning. [paper] [code]
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition. [paper]

Video Captioning

Discourse Analysis for Evaluating Coherence in Video Paragraph Captions. [paper]
Cross-modal Contrastive Distillation for Instructional Activity Anticipation. [paper]
End-to-end Generative Pretraining for Multimodal Video Captioning. [paper]
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation. [paper] [code]
Dual-Level Decoupled Transformer for Video Captioning. [paper]
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information. [paper] [code]

IJCAI 2022

Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. [paper]

CVPR 2022

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning. [paper]
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning. [paper]

Video Captioning

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics. [paper] (Workshop)

AAAI 2022

Image Captioning

Image Difference Captioning with Pre-training and Contrastive Learning. [paper]

2021

NIPS 2021

Image Captioning

Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation. [paper]
FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark. [paper] [code]

Video Captioning

Multi-modal Dependency Tree for Video Captioning. [paper]

EMNLP 2021

Image Captioning

Visual News: Benchmark and Challenges in News Image Captioning. [paper] [code]
R3Net:Relation-embedded Representation Reconstruction Network for Change Captioning. [paper] [code]
CLIPScore: A Reference-free Evaluation Metric for Image Captioning. [paper]
Journalistic Guidelines Aware News Image Captioning. [paper]
Understanding Guided Image Captioning Performance across Domains. [paper] [code] (CoNLL)
Language Resource Efficient Learning for Captioning. [paper] (Findings)
Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning. [paper] (Findings)
QACE: Asking Questions to Evaluate an Image Caption. [paper] (Findings)
COSMic: A Coherence-Aware Generation Metric for Image Descriptions. [paper] (Findings)

ICCV 2021

Image Captioning

Auto-Parsing Network for Image Captioning and Visual Question Answering. [paper]
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning. [paper]
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation. [paper]
Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning. [paper]
Topic Scene Graph Generation by Attention Distillation from Caption. [paper]
Understanding and Evaluating Racial Biases in Image Captioning. [paper] [code]
In Defense of Scene Graphs for Image Captioning. [paper] [code]
Viewpoint-Agnostic Change Captioning with Cycle Consistency. [paper]
Visual-Textual Attentive Semantic Consistency for Medical Report Generation. [paper]
Semi-Autoregressive Transformer for Image Captioning. [paper] (Workshop)

Video Captioning

End-to-End Dense Video Captioning with Parallel Decoding. [paper] [code]
Motion Guided Region Message Passing for Video Captioning. [paper]

ACMMM 2021

Image Captioning

Distributed Attention for Grounded Image Captioning. [paper]
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning. [paper] [code]
Group-based Distinctive Image Captioning with Memory Attention. [paper]
Direction Relation Transformer for Image Captioning. [paper]

Text Captioning

Question-controlled Text-aware Image Captioning. [paper]

Video Captioning

Hybrid Reasoning Network for Video-based Commonsense Captioning. [paper]
Discriminative Latent Semantic Graph for Video Captioning. [paper] [code]
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention. [paper]
CLIP4Caption: CLIP for Video Caption. [paper]

Interspeech 2021

Video Captioning

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers. [paper]

ACL 2021

Image Captioning

Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation. [paper]
Competence-based Multimodal Curriculum Learning for Medical Report Generation.
Control Image Captioning Spatially and Temporally. [paper]
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis. [paper]
Enhancing Descriptive Image Captioning with Natural Language Inference.
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning. [paper] [code]
Cross-modal Memory Networks for Radiology Report Generation.

Video Captioning

Hierarchical Context-aware Network for Dense Video Event Captioning.
Video Paragraph Captioning as a Text Summarization Task.

IJCAI 2021

Image Captioning

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning. [paper]

NAACL 2021

Image Captioning

Quality Estimation for Image Captions Based on Large-scale Human Evaluations. [paper]
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation. [paper]

Video Captioning

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization. [paper]

CVPR 2021

Image Captioning

Connecting What to Say With Where to Look by Modeling Human Attention Traces. [paper] [code]
Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. [paper]
Image Change Captioning by Learning From an Auxiliary Task. [paper]
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. [paper] [code]
FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. [paper]
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words. [paper]
Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles. [paper]

Text Captioning

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship. [paper]
TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption. [paper]
Towards Accurate Text-Based Image Captioning With Content Diversity Exploration. [paper]

Video Captioning

Open-Book Video Captioning With Retrieve-Copy-Generate Network. [paper]
Towards Diverse Paragraph Captioning for Untrimmed Videos. [paper]
Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning. [paper]

ICASSP 2021

Image Captioning

Cascade Attention Fusion for Fine-grained Image Captioning based on Multi-layer LSTM. [paper]
Triple Sequence Generative Adversarial Nets for Unsupervised Image Captioning. [paper]

AAAI 2021

Image Captioning

Partially Non-Autoregressive Image Captioning. [paper] [code]
Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network. [paper]
Object Relation Attention for Image Paragraph Captioning. [paper]
Dual-Level Collaborative Transformer for Image Captioning. [paper] [code]
Memory-Augmented Image Captioning. [paper]
Image Captioning with Context-Aware Auxiliary Guidance. [paper]
Consensus Graph Representation Learning for Better Grounded Image Captioning. [paper]
FixMyPose: Pose Correctional Captioning and Retrieval. [paper] [code]
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning. [paper]

Video Captioning

Non-Autoregressive Coarse-to-Fine Video Captioning. [paper] [code]
Semantic Grouping Network for Video Captioning. [paper] [code]
Augmented Partial Mutual Learning with Frame Masking for Video Captioning. [paper]

TPAMI 2021

Video Captioning

Saying the Unseen: Video Descriptions via Dialog Agents. [paper]

2020

EMNLP 2020

Image Captioning

CapWAP: Captioning with a Purpose. [paper] [code]
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements. [paper] [code]
Visually Grounded Continual Learning of Compositional Phrases. [paper]
Pragmatic Issue-Sensitive Image Captioning. [paper]
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game. [paper]
Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze. [paper]
ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization. [paper]

Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning. [paper]

NIPS 2020

Image Captioning

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning. [paper]
Diverse Image Captioning with Context-Object Split Latent Spaces. [paper]
Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning. [paper]

ACMMM 2020

Image Captioning

Structural Semantic Adversarial Active Learning for Image Captioning. [paper]
Iterative Back Modification for Faster Image Captioning. [paper]
Bridging the Gap between Vision and Language Domains for Improved Image Captioning. [paper]
Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. [paper]
Improving Intra- and Inter-Modality Visual Relation for Image Captioning. [paper]
ICECAP: Information Concentrated Entity-aware Image Captioning. [paper]
Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal. [paper]

Text Captioning

Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. [paper]

Video Captioning

Controllable Video Captioning with an Exemplar Sentence. [paper]
Poet: Product-oriented Video Captioner for E-commerce. [paper] [code]
Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning. [paper]
Relational Graph Learning for Grounded Video Description Generation. [paper]

ECCV 2020

Image Captioning

Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets. [paper]
Towards Unique and Informative Captioning of Images. [paper]
Learning Visual Representations with Caption Annotations. [paper]
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. [paper] [code]
Length Controllable Image Captioning. [paper] [code]
Comprehensive Image Captioning via Scene Graph Decomposition. [paper]
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
Captioning Images Taken by People Who Are Blind. [paper]
Learning to Generate Grounded Visual Captions without Localization Supervision. [paper] [code]
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
Describing Textures using Natural Language. [paper]
Connecting Vision and Language with Localized Narratives. [paper] [code]

Text Captioning

TextCaps: a Dataset for Image Captioning with Reading Comprehension. [paper] [code]

Video Captioning

Character Grounding and Re-Identification in Story of Videos and Text Descriptions. [paper] [code]
SODA: Story Oriented Dense Video Captioning Evaluation Framework. [paper] [code]
In-Home Daily-Life Captioning Using Radio Signals. [paper]
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval. [paper] [code]
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos. [paper]
Identity-Aware Multi-Sentence Video Description. [paper]

IJCAI 2020

Image Captioning

Human Consensus-Oriented Image Captioning. [paper]
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning. [paper]
Recurrent Relational Memory Network for Unsupervised Image Captioning. [paper]

Video Captioning

Learning to Discretely Compose Reasoning Module Networks for Video Captioning. [paper] [code]
SBAT: Video Captioning with Sparse Boundary-Aware Transformer. [paper]
Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description. [paper]

ACL 2020

Image Captioning

Clue: Cross-modal Coherence Modeling for Caption Generation. [paper]
Improving Image Captioning Evaluation by Considering Inter References Variance. [paper] [code]
Improving Image Captioning with Better Use of Caption. [paper] [code]

Video Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning. [paper] [code]

CVPR 2020

Image Captioning

Context-Aware Group Captioning via Self-Attention and Contrastive Features. [paper] [code]
Show, Edit and Tell: A Framework for Editing Image Captions. [paper] [code]
Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs. [paper] [code]
Normalized and Geometry-Aware Self-Attention Network for Image Captioning. [paper]
Meshed-Memory Transformer for Image Captioning. [paper] [code]
X-Linear Attention Networks for Image Captioning. [paper] [code]
Transform and Tell: Entity-Aware News Image Captioning. [paper] [code]
More Grounded Image Captioning by Distilling Image-Text Matching Model. [paper] [code]
Better Captioning With Sequence-Level Exploration. [paper]

Video Captioning

Object Relational Graph With Teacher-Recommended Learning for Video Captioning. [paper]
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation. [paper] [code]
Better Captioning With Sequence-Level Exploration. [paper]
Syntax-Aware Action Targeting for Video Captioning. [paper] [code]
Screencast Tutorial Video Understanding. [paper]

AAAI 2020

Image Captioning

Unified Vision-Language Pre-Training for Image Captioning and VQA. [paper] [code]
Reinforcing an Image Caption Generator using Off-line Human Feedback. [paper]
Memorizing Style Knowledge for Image Captioning. [paper]
Joint Commonsense and Relation Reasoning for Image and Video Captioning. [paper]
Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption. [paper]
Show, Recall, and Tell: Image Captioning with Recall Mechanism. [paper]
Interactive Dual Generative Adversarial Networks for Image Captioning. [paper]
Feature Deformation Meta-Networks in Image Captioning of Novel Objects. [paper]

Video Captioning

An Efficient Framework for Dense Video Captioning. [paper]

2019

NIPS 2019

Image Captioning

Adaptively Aligned Image Captioning via Adaptive Attention Time. [paper] [code]
Image Captioning: Transforming Objects into Words. [paper] [code]
Variational Structured Semantic Inference for Diverse Image Captioning. [paper]

ICCV 2019

Image Captioning

Robust Change Captioning. [paper]
Attention on Attention for Image Captioning. [paper]
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style. [paper]
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment. [paper]
Hierarchy Parsing for Image Captioning. [paper]
Generating Diverse and Descriptive Image Captions Using Visual Paraphrases. [paper]
Learning to Collocate Neural Modules for Image Captioning. [paper]
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning. [paper]
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings. [paper]
Human Attention in Image Captioning: Dataset and Analysis. [paper]
Reflective Decoding Network for Image Captioning. [paper]
Joint Optimization for Cooperative Image Captioning. [paper]
Entangled Transformer for Image Captioning. [paper]
nocaps: novel object captioning at scale. [paper]
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection. [paper]
Unpaired Image Captioning via Scene Graph Alignments. [paper]
Learning to Caption Images Through a Lifetime by Asking Questions. [paper]

Video Captioning

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. [paper]
Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network. [paper]
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning. [paper]
Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning. [paper]

ACL 2019

Image Captioning

Informative Image Captioning with External Sources of Information [paper]
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Generating Question Relevant Captions to Aid Visual Question Answering [paper]

Video Captioning

Dense Procedure Captioning in Narrated Instructional Videos [paper]

CVPR 2019

Image Captioning

Auto-Encoding Scene Graphs for Image Captioning [paper] [code]
Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech [paper]
Unsupervised Image Captioning [paper] [code]
Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Describing like Humans: On Diversity in Image Captioning [paper]
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text [paper]
Leveraging Captioning to Boost Semantics for Salient Object Detection [paper] [code]
Context and Attribute Grounded Dense Captioning [paper]
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning [paper]
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [paper]
Self-Critical N-step Training for Image Captioning [paper]
Look Back and Predict Forward in Image Captioning [paper]
Intention Oriented Image Captions with Guiding Objects [paper]
Adversarial Semantic Alignment for Improved Image Captions [paper]
Good News, Everyone! Context driven entity-aware captioning for news images. [paper] [code]
Pointing Novel Objects in Image Captioning [paper]
Engaging Image Captioning via Personality [paper]
Intention Oriented Image Captions With Guiding Objects [paper]
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings. [paper]

Video Captioning

Streamlined Dense Video Captioning. [paper]
Grounded Video Description. [paper]
Adversarial Inference for Multi-Sentence Video Description. [paper]
Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning. [paper]
Memory-Attended Recurrent Network for Video Captioning. [paper]
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning. [paper]

AAAI 2019

Image Captioning

Improving Image Captioning with Conditional Generative Adversarial Nets [paper]
Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding [paper]
Meta Learning for Image Captioning [paper]
Deliberate Residual based Attention Network for Image Captioning [paper]
Hierarchical Attention Network for Image Captioning [paper]
Learning Object Context for Dense Captioning [paper]

Video Captioning

Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning [code] [paper]
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [paper]
Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention [paper]
Motion Guided Spatial Attention for Video Captioning [paper]

2018

NIPS 2018

Image Captioning

A Neural Compositional Paradigm for Image Captioning. [paper] [code]

Video Captioning

Weakly Supervised Dense Event Captioning in Videos. [paper] [code]

ECCV 2018

Image Captioning

Unpaired Image Captioning by Language Pivoting. [paper] [code]
Exploring Visual Relationship for Image Captioning. [paper]
Recurrent Fusion Network for Image Captioning. [paper] [code]
Boosted Attention: Leveraging Human Attention for Image Captioning. [paper]
Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data. [paper]
"Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention. [paper]

ACL 2018

Image Captioning

Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning. [paper]
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. [paper] [code]

IJCAI 2018

Image Captioning

Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning. [paper]

CVPR 2018

Image Captioning

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. [paper] [code]
Neural Baby Talk. [paper]
GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints. [paper]

Video Captioning

Reconstruction Network for Video Captioning. [paper] [code]

2017

ICCV 2017

Image Captioning

Boosting Image Captioning with Attributes. [paper]
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. [paper] [code]

CVPR 2017

Image Captioning

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning. [paper] [code]
When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. [paper] [code]
Self-critical Sequence Training for Image Captioning. [paper]
Semantic Compositional Networks for Visual Captioning. [paper] [code]
StyleNet: Generating Attractive Visual Captions with Styles. [paper] [code]

TPAMI 2017

Image Captioning

BreakingNews: Article Annotation by Image and Text Processing. [paper]

2016

ECCV 2016

SPICE: Semantic Propositional Image Caption Evaluation. [paper] [code]
Generating Visual Explanations. [paper] [code]

CVPR 2016

Image Captioning

Image Captioning with Semantic Attention. [paper] [code]
Learning Deep Representations of Fine-grained Visual Descriptions. [paper] [code]
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. [paper] [code]

TPAMI 2016

Image Captioning

Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts. [paper] [code]

2015

ICML 2015

Image Captioning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. [paper]

ICCV 2015

Image Captioning

Guiding Long-Short Term Memory for Image Caption Generation. [paper]

CVPR 2015

Image Captioning

Show and Tell: A Neural Image Caption Generator. [paper]
Deep Visual-Semantic Alignments for Generating Image Descriptions. [paper] [code]
CIDEr: Consensus-based Image Description Evaluation. [paper] [cider]

ICLR 2015

Image Captioning

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). [paper]

Dataset

MSCOCO
Flickr30K
Flickr8K
VizWiz

Popular Codebase

ruotianluo/ImageCaptioning.pytorch

Reference and Acknowledgement

awesome-image-captioning from Zhihong Chen

Really appreciate for there contributions in this area.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

terry-r123 / Awesome-Captioning

Labels

Projects that are alternatives of or similar to Awesome-Captioning

Awesome Captioning:

Table of Contents

Survey Papers

2021

Research Papers

2022

arxiv 2022

Image Captioning

Video Captioning

IJCAI 2022

CVPR 2022

Video Captioning

AAAI 2022

Image Captioning

2021

NIPS 2021

Image Captioning

Video Captioning

EMNLP 2021

Image Captioning

ICCV 2021

Image Captioning

Video Captioning

ACMMM 2021

Image Captioning

Text Captioning

Video Captioning

Interspeech 2021

Video Captioning

ACL 2021

Image Captioning

Video Captioning

IJCAI 2021

Image Captioning

NAACL 2021

Image Captioning

Video Captioning

CVPR 2021

Image Captioning

Text Captioning

Video Captioning

ICASSP 2021

Image Captioning

AAAI 2021

Image Captioning

Video Captioning

TPAMI 2021

Video Captioning

2020

EMNLP 2020

Image Captioning

Video Captioning

NIPS 2020

Image Captioning

ACMMM 2020

Image Captioning

Text Captioning

Video Captioning

ECCV 2020

Image Captioning

Text Captioning

Video Captioning

IJCAI 2020

Image Captioning

Video Captioning

ACL 2020

Image Captioning

Video Captioning

CVPR 2020

Image Captioning

Video Captioning

AAAI 2020

Image Captioning

Video Captioning

2019

NIPS 2019