All Projects → terry-r123 → Awesome-Captioning

terry-r123 / Awesome-Captioning

Licence: other
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

Projects that are alternatives of or similar to Awesome-Captioning

densecap
Dense video captioning in PyTorch
Stars: ✭ 37 (-33.93%)
Mutual labels:  video-captioning
BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-50%)
Mutual labels:  image-captioning
gramtion
Twitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-62.5%)
Mutual labels:  image-captioning
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-67.86%)
Mutual labels:  image-captioning
Video2Language
Generating video descriptions using deep learning in Keras
Stars: ✭ 22 (-60.71%)
Mutual labels:  video-captioning
LaBERT
A length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (-10.71%)
Mutual labels:  image-captioning
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+332.14%)
Mutual labels:  image-captioning
Show-Attend-and-Tell
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Stars: ✭ 58 (+3.57%)
Mutual labels:  image-captioning
MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Stars: ✭ 38 (-32.14%)
Mutual labels:  video-captioning
Image-Captioining
The objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…
Stars: ✭ 20 (-64.29%)
Mutual labels:  image-captioning
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+267.86%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (+1.79%)
Mutual labels:  image-captioning
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (+32.14%)
Mutual labels:  image-captioning
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-14.29%)
Mutual labels:  image-captioning
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (+1.79%)
Mutual labels:  image-captioning
Show Control And Tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Stars: ✭ 243 (+333.93%)
Mutual labels:  image-captioning
TVCaption
[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset
Stars: ✭ 74 (+32.14%)
Mutual labels:  video-captioning
Image-Captioning
Image Captioning with Keras
Stars: ✭ 60 (+7.14%)
Mutual labels:  image-captioning
Adaptive
Pytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Stars: ✭ 97 (+73.21%)
Mutual labels:  image-captioning
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (-57.14%)
Mutual labels:  image-captioning

Awesome Captioning:Awesome

A curated list of Visual Captioning and related area.

Table of Contents

Survey Papers

2021

  • From Show to Tell: A Survey on Image Captioning. [paper]

Research Papers

2022

arxiv 2022

Image Captioning
  • Compact Bidirectional Transformer for Image Captioning. [paper] [code]
  • ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer. [paper]
  • I-Tuning: Tuning Language Models with Image for Caption Generation. [paper]
  • CaMEL: Mean Teacher Learning for Image Captioning. [paper] [code]
  • Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition. [paper]
Video Captioning
  • Discourse Analysis for Evaluating Coherence in Video Paragraph Captions. [paper]
  • Cross-modal Contrastive Distillation for Instructional Activity Anticipation. [paper]
  • End-to-end Generative Pretraining for Multimodal Video Captioning. [paper]
  • Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation. [paper] [code]
  • Dual-Level Decoupled Transformer for Video Captioning. [paper]
  • Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information. [paper] [code]

IJCAI 2022

  • Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds. [paper]

CVPR 2022

  • X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning. [paper]
  • Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning. [paper]
Video Captioning
  • What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics. [paper] (Workshop)

AAAI 2022

Image Captioning
  • Image Difference Captioning with Pre-training and Contrastive Learning. [paper]

2021

NIPS 2021

Image Captioning
  • Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation. [paper]
  • FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark. [paper] [code]
Video Captioning
  • Multi-modal Dependency Tree for Video Captioning. [paper]

EMNLP 2021

Image Captioning
  • Visual News: Benchmark and Challenges in News Image Captioning. [paper] [code]
  • R3Net:Relation-embedded Representation Reconstruction Network for Change Captioning. [paper] [code]
  • CLIPScore: A Reference-free Evaluation Metric for Image Captioning. [paper]
  • Journalistic Guidelines Aware News Image Captioning. [paper]
  • Understanding Guided Image Captioning Performance across Domains. [paper] [code] (CoNLL)
  • Language Resource Efficient Learning for Captioning. [paper] (Findings)
  • Retrieval, Analogy, and Composition: A framework for Compositional Generalization in Image Captioning. [paper] (Findings)
  • QACE: Asking Questions to Evaluate an Image Caption. [paper] (Findings)
  • COSMic: A Coherence-Aware Generation Metric for Image Descriptions. [paper] (Findings)

ICCV 2021

Image Captioning
  • Auto-Parsing Network for Image Captioning and Visual Question Answering. [paper]
  • Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning. [paper]
  • Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation. [paper]
  • Partial Off-Policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning. [paper]
  • Topic Scene Graph Generation by Attention Distillation from Caption. [paper]
  • Understanding and Evaluating Racial Biases in Image Captioning. [paper] [code]
  • In Defense of Scene Graphs for Image Captioning. [paper] [code]
  • Viewpoint-Agnostic Change Captioning with Cycle Consistency. [paper]
  • Visual-Textual Attentive Semantic Consistency for Medical Report Generation. [paper]
  • Semi-Autoregressive Transformer for Image Captioning. [paper] (Workshop)
Video Captioning
  • End-to-End Dense Video Captioning with Parallel Decoding. [paper] [code]
  • Motion Guided Region Message Passing for Video Captioning. [paper]

ACMMM 2021

Image Captioning
  • Distributed Attention for Grounded Image Captioning. [paper]
  • Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning. [paper] [code]
  • Group-based Distinctive Image Captioning with Memory Attention. [paper]
  • Direction Relation Transformer for Image Captioning. [paper]
Text Captioning
  • Question-controlled Text-aware Image Captioning. [paper]
Video Captioning
  • Hybrid Reasoning Network for Video-based Commonsense Captioning. [paper]
  • Discriminative Latent Semantic Graph for Video Captioning. [paper] [code]
  • Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention. [paper]
  • CLIP4Caption: CLIP for Video Caption. [paper]

Interspeech 2021

Video Captioning
  • Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers. [paper]

ACL 2021

Image Captioning
  • Writing by Memorizing: Hierarchical Retrieval-based Medical Report Generation. [paper]
  • Competence-based Multimodal Curriculum Learning for Medical Report Generation.
  • Control Image Captioning Spatially and Temporally. [paper]
  • SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis. [paper]
  • Enhancing Descriptive Image Captioning with Natural Language Inference.
  • UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning. [paper] [code]
  • Cross-modal Memory Networks for Radiology Report Generation.
Video Captioning
  • Hierarchical Context-aware Network for Dense Video Event Captioning.
  • Video Paragraph Captioning as a Text Summarization Task.

IJCAI 2021

Image Captioning
  • TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning. [paper]

NAACL 2021

Image Captioning
  • Quality Estimation for Image Captions Based on Large-scale Human Evaluations. [paper]
  • Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation. [paper]
Video Captioning
  • DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization. [paper]

CVPR 2021

Image Captioning
  • Connecting What to Say With Where to Look by Modeling Human Attention Traces. [paper] [code]
  • Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles. [paper]
  • Image Change Captioning by Learning From an Auxiliary Task. [paper]
  • Scan2Cap: Context-aware Dense Captioning in RGB-D Scans. [paper] [code]
  • FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation. [paper]
  • RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words. [paper]
  • Human-Like Controllable Image Captioning With Verb-Specific Semantic Roles. [paper]
Text Captioning
  • Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship. [paper]
  • TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption. [paper]
  • Towards Accurate Text-Based Image Captioning With Content Diversity Exploration. [paper]
Video Captioning
  • Open-Book Video Captioning With Retrieve-Copy-Generate Network. [paper]
  • Towards Diverse Paragraph Captioning for Untrimmed Videos. [paper]
  • Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning. [paper]

ICASSP 2021

Image Captioning
  • Cascade Attention Fusion for Fine-grained Image Captioning based on Multi-layer LSTM. [paper]
  • Triple Sequence Generative Adversarial Nets for Unsupervised Image Captioning. [paper]

AAAI 2021

Image Captioning
  • Partially Non-Autoregressive Image Captioning. [paper] [code]
  • Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network. [paper]
  • Object Relation Attention for Image Paragraph Captioning. [paper]
  • Dual-Level Collaborative Transformer for Image Captioning. [paper] [code]
  • Memory-Augmented Image Captioning. [paper]
  • Image Captioning with Context-Aware Auxiliary Guidance. [paper]
  • Consensus Graph Representation Learning for Better Grounded Image Captioning. [paper]
  • FixMyPose: Pose Correctional Captioning and Retrieval. [paper] [code]
  • VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning. [paper]
Video Captioning
  • Non-Autoregressive Coarse-to-Fine Video Captioning. [paper] [code]
  • Semantic Grouping Network for Video Captioning. [paper] [code]
  • Augmented Partial Mutual Learning with Frame Masking for Video Captioning. [paper]

TPAMI 2021

Video Captioning
  • Saying the Unseen: Video Descriptions via Dialog Agents. [paper]

2020

EMNLP 2020

Image Captioning
  • CapWAP: Captioning with a Purpose. [paper] [code]
  • Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements. [paper] [code]
  • Visually Grounded Continual Learning of Compositional Phrases. [paper]
  • Pragmatic Issue-Sensitive Image Captioning. [paper]
  • Structural and Functional Decomposition for Personality Image Captioning in a Communication Game. [paper]
  • Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze. [paper]
  • ZEST: Zero-shot Learning from Text Descriptions using Textual Similarity and Visual Summarization. [paper]
Video Captioning
  • Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning. [paper]

NIPS 2020

Image Captioning
  • RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning. [paper]
  • Diverse Image Captioning with Context-Object Split Latent Spaces. [paper]
  • Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning. [paper]

ACMMM 2020

Image Captioning
  • Structural Semantic Adversarial Active Learning for Image Captioning. [paper]
  • Iterative Back Modification for Faster Image Captioning. [paper]
  • Bridging the Gap between Vision and Language Domains for Improved Image Captioning. [paper]
  • Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. [paper]
  • Improving Intra- and Inter-Modality Visual Relation for Image Captioning. [paper]
  • ICECAP: Information Concentrated Entity-aware Image Captioning. [paper]
  • Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal. [paper]
Text Captioning
  • Multimodal Attention with Image Text Spatial Relationship for OCR-Based Image Captioning. [paper]
Video Captioning
  • Controllable Video Captioning with an Exemplar Sentence. [paper]
  • Poet: Product-oriented Video Captioner for E-commerce. [paper] [code]
  • Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning. [paper]
  • Relational Graph Learning for Grounded Video Description Generation. [paper]

ECCV 2020

Image Captioning
  • Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets. [paper]
  • Towards Unique and Informative Captioning of Images. [paper]
  • Learning Visual Representations with Caption Annotations. [paper]
  • Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. [paper] [code]
  • Length Controllable Image Captioning. [paper] [code]
  • Comprehensive Image Captioning via Scene Graph Decomposition. [paper]
  • Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
  • Captioning Images Taken by People Who Are Blind. [paper]
  • Learning to Generate Grounded Visual Captions without Localization Supervision. [paper] [code]
  • Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper]
  • Describing Textures using Natural Language. [paper]
  • Connecting Vision and Language with Localized Narratives. [paper] [code]
Text Captioning
  • TextCaps: a Dataset for Image Captioning with Reading Comprehension. [paper] [code]
Video Captioning
  • Character Grounding and Re-Identification in Story of Videos and Text Descriptions. [paper] [code]
  • SODA: Story Oriented Dense Video Captioning Evaluation Framework. [paper] [code]
  • In-Home Daily-Life Captioning Using Radio Signals. [paper]
  • TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval. [paper] [code]
  • Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos. [paper]
  • Identity-Aware Multi-Sentence Video Description. [paper]

IJCAI 2020

Image Captioning
  • Human Consensus-Oriented Image Captioning. [paper]
  • Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning. [paper]
  • Recurrent Relational Memory Network for Unsupervised Image Captioning. [paper]
Video Captioning
  • Learning to Discretely Compose Reasoning Module Networks for Video Captioning. [paper] [code]
  • SBAT: Video Captioning with Sparse Boundary-Aware Transformer. [paper]
  • Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description. [paper]

ACL 2020

Image Captioning
  • Clue: Cross-modal Coherence Modeling for Caption Generation. [paper]
  • Improving Image Captioning Evaluation by Considering Inter References Variance. [paper] [code]
  • Improving Image Captioning with Better Use of Caption. [paper] [code]
Video Captioning
  • MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning. [paper] [code]

CVPR 2020

Image Captioning
  • Context-Aware Group Captioning via Self-Attention and Contrastive Features. [paper] [code]
  • Show, Edit and Tell: A Framework for Editing Image Captions. [paper] [code]
  • Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs. [paper] [code]
  • Normalized and Geometry-Aware Self-Attention Network for Image Captioning. [paper]
  • Meshed-Memory Transformer for Image Captioning. [paper] [code]
  • X-Linear Attention Networks for Image Captioning. [paper] [code]
  • Transform and Tell: Entity-Aware News Image Captioning. [paper] [code]
  • More Grounded Image Captioning by Distilling Image-Text Matching Model. [paper] [code]
  • Better Captioning With Sequence-Level Exploration. [paper]
Video Captioning
  • Object Relational Graph With Teacher-Recommended Learning for Video Captioning. [paper]
  • Spatio-Temporal Graph for Video Captioning With Knowledge Distillation. [paper] [code]
  • Better Captioning With Sequence-Level Exploration. [paper]
  • Syntax-Aware Action Targeting for Video Captioning. [paper] [code]
  • Screencast Tutorial Video Understanding. [paper]

AAAI 2020

Image Captioning
  • Unified Vision-Language Pre-Training for Image Captioning and VQA. [paper] [code]
  • Reinforcing an Image Caption Generator using Off-line Human Feedback. [paper]
  • Memorizing Style Knowledge for Image Captioning. [paper]
  • Joint Commonsense and Relation Reasoning for Image and Video Captioning. [paper]
  • Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption. [paper]
  • Show, Recall, and Tell: Image Captioning with Recall Mechanism. [paper]
  • Interactive Dual Generative Adversarial Networks for Image Captioning. [paper]
  • Feature Deformation Meta-Networks in Image Captioning of Novel Objects. [paper]
Video Captioning
  • An Efficient Framework for Dense Video Captioning. [paper]

2019

NIPS 2019

Image Captioning
  • Adaptively Aligned Image Captioning via Adaptive Attention Time. [paper] [code]
  • Image Captioning: Transforming Objects into Words. [paper] [code]
  • Variational Structured Semantic Inference for Diverse Image Captioning. [paper]

ICCV 2019

Image Captioning
  • Robust Change Captioning. [paper]
  • Attention on Attention for Image Captioning. [paper]
  • Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style. [paper]
  • Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment. [paper]
  • Hierarchy Parsing for Image Captioning. [paper]
  • Generating Diverse and Descriptive Image Captions Using Visual Paraphrases. [paper]
  • Learning to Collocate Neural Modules for Image Captioning. [paper]
  • Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning. [paper]
  • Towards Unsupervised Image Captioning With Shared Multimodal Embeddings. [paper]
  • Human Attention in Image Captioning: Dataset and Analysis. [paper]
  • Reflective Decoding Network for Image Captioning. [paper]
  • Joint Optimization for Cooperative Image Captioning. [paper]
  • Entangled Transformer for Image Captioning. [paper]
  • nocaps: novel object captioning at scale. [paper]
  • Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection. [paper]
  • Unpaired Image Captioning via Scene Graph Alignments. [paper]
  • Learning to Caption Images Through a Lifetime by Asking Questions. [paper]
Video Captioning
  • VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research. [paper]
  • Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network. [paper]
  • Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning. [paper]
  • Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning. [paper]

ACL 2019

Image Captioning
  • Informative Image Captioning with External Sources of Information [paper]

  • Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]

  • Generating Question Relevant Captions to Aid Visual Question Answering [paper]

Video Captioning
  • Dense Procedure Captioning in Narrated Instructional Videos [paper]

CVPR 2019

Image Captioning
  • Auto-Encoding Scene Graphs for Image Captioning [paper] [code]

  • Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech [paper]

  • Unsupervised Image Captioning [paper] [code]

  • Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]

  • Describing like Humans: On Diversity in Image Captioning [paper]

  • MSCap: Multi-Style Image Captioning With Unpaired Stylized Text [paper]

  • Leveraging Captioning to Boost Semantics for Salient Object Detection [paper] [code]

  • Context and Attribute Grounded Dense Captioning [paper]

  • Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning [paper]

  • Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [paper]

  • Self-Critical N-step Training for Image Captioning [paper]

  • Look Back and Predict Forward in Image Captioning [paper]

  • Intention Oriented Image Captions with Guiding Objects [paper]

  • Adversarial Semantic Alignment for Improved Image Captions [paper]

  • Good News, Everyone! Context driven entity-aware captioning for news images. [paper] [code]

  • Pointing Novel Objects in Image Captioning [paper]

  • Engaging Image Captioning via Personality [paper]

  • Intention Oriented Image Captions With Guiding Objects [paper]

  • Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]

  • Towards Unsupervised Image Captioning with Shared Multimodal Embeddings. [paper]

Video Captioning
  • Streamlined Dense Video Captioning. [paper]
  • Grounded Video Description. [paper]
  • Adversarial Inference for Multi-Sentence Video Description. [paper]
  • Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning. [paper]
  • Memory-Attended Recurrent Network for Video Captioning. [paper]
  • Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning. [paper]

AAAI 2019

Image Captioning
  • Improving Image Captioning with Conditional Generative Adversarial Nets [paper]

  • Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding [paper]

  • Meta Learning for Image Captioning [paper]

  • Deliberate Residual based Attention Network for Image Captioning [paper]

  • Hierarchical Attention Network for Image Captioning [paper]

  • Learning Object Context for Dense Captioning [paper]

Video Captioning
  • Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning [code] [paper]

  • Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [paper]

  • Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention [paper]

  • Motion Guided Spatial Attention for Video Captioning [paper]

2018

NIPS 2018

Image Captioning
  • A Neural Compositional Paradigm for Image Captioning. [paper] [code]
Video Captioning
  • Weakly Supervised Dense Event Captioning in Videos. [paper] [code]

ECCV 2018

Image Captioning
  • Unpaired Image Captioning by Language Pivoting. [paper] [code]
  • Exploring Visual Relationship for Image Captioning. [paper]
  • Recurrent Fusion Network for Image Captioning. [paper] [code]
  • Boosted Attention: Leveraging Human Attention for Image Captioning. [paper]
  • Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data. [paper]
  • "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention. [paper]

ACL 2018

Image Captioning
  • Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning. [paper]
  • Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning. [paper] [code]

IJCAI 2018

Image Captioning
  • Show and Tell More: Topic-Oriented Multi-Sentence Image Captioning. [paper]

CVPR 2018

Image Captioning
  • Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. [paper] [code]
  • Neural Baby Talk. [paper]
  • GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints. [paper]
Video Captioning

2017

ICCV 2017

Image Captioning
  • Boosting Image Captioning with Attributes. [paper]
  • Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. [paper] [code]

CVPR 2017

Image Captioning
  • SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning. [paper] [code]
  • When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. [paper] [code]
  • Self-critical Sequence Training for Image Captioning. [paper]
  • Semantic Compositional Networks for Visual Captioning. [paper] [code]
  • StyleNet: Generating Attractive Visual Captions with Styles. [paper] [code]

TPAMI 2017

Image Captioning
  • BreakingNews: Article Annotation by Image and Text Processing. [paper]

2016

ECCV 2016

CVPR 2016

Image Captioning
  • Image Captioning with Semantic Attention. [paper] [code]
  • Learning Deep Representations of Fine-grained Visual Descriptions. [paper] [code]
  • Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. [paper] [code]

TPAMI 2016

Image Captioning
  • Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts. [paper] [code]

2015

ICML 2015

Image Captioning
  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. [paper]

ICCV 2015

Image Captioning
  • Guiding Long-Short Term Memory for Image Caption Generation. [paper]

CVPR 2015

Image Captioning
  • Show and Tell: A Neural Image Caption Generator. [paper]
  • Deep Visual-Semantic Alignments for Generating Image Descriptions. [paper] [code]
  • CIDEr: Consensus-based Image Description Evaluation. [paper] [cider]

ICLR 2015

Image Captioning
  • Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). [paper]

Dataset

  • MSCOCO
  • Flickr30K
  • Flickr8K
  • VizWiz

Popular Codebase

Reference and Acknowledgement

Really appreciate for there contributions in this area.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].