All Projects → AaronCCWong → Show-Attend-and-Tell

AaronCCWong / Show-Attend-and-Tell

Licence: other
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Show-Attend-and-Tell

A Pytorch Tutorial To Image Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Stars: ✭ 1,867 (+3118.97%)
Mutual labels:  image-captioning, show-attend-and-tell
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (-58.62%)
Mutual labels:  image-captioning
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (+244.83%)
Mutual labels:  image-captioning
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+255.17%)
Mutual labels:  image-captioning
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+318.97%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (-1.72%)
Mutual labels:  image-captioning
Sca Cnn.cvpr17
Image Captions Generation with Spatial and Channel-wise Attention
Stars: ✭ 198 (+241.38%)
Mutual labels:  image-captioning
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (-1.72%)
Mutual labels:  image-captioning
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (+27.59%)
Mutual labels:  image-captioning
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-68.97%)
Mutual labels:  image-captioning
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-17.24%)
Mutual labels:  image-captioning
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+317.24%)
Mutual labels:  image-captioning
BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-51.72%)
Mutual labels:  image-captioning
Meshed Memory Transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Stars: ✭ 230 (+296.55%)
Mutual labels:  image-captioning
Image-Captioining
The objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…
Stars: ✭ 20 (-65.52%)
Mutual labels:  image-captioning
Image To Image Search
A reverse image search engine powered by elastic search and tensorflow
Stars: ✭ 200 (+244.83%)
Mutual labels:  image-captioning
Im2LaTeX
An implementation of the Show, Attend and Tell paper in Tensorflow, for the OpenAI Im2LaTeX suggested problem
Stars: ✭ 16 (-72.41%)
Mutual labels:  show-attend-and-tell
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-37.93%)
Mutual labels:  image-captioning
Adaptive
Pytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Stars: ✭ 97 (+67.24%)
Mutual labels:  image-captioning
gramtion
Twitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-63.79%)
Mutual labels:  image-captioning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

A PyTorch implementation

For a trained model to load into the decoder, use

Some training statistics

BLEU scores for VGG19 (Orange) and ResNet152 (Red) Trained With Teacher Forcing.

BLEU Score Graph Top-K Accuracy Graph
BLEU-1 BLEU-1 Training Top-1 Train TOP-1
BLEU-2 BLEU-2 Training Top-5 Train TOP-5
BLEU-3 BLEU-3 Validation Top-1 Val TOP-1
BLEU-4 BLEU-4 Validation Top-5 Val TOP-5

To Train

This was written in python3 so may not work for python2. Download the COCO dataset training and validation images. Put them in data/coco/imgs/train2014 and data/coco/imgs/val2014 respectively. Put the COCO dataset split JSON file from Deep Visual-Semantic Alignments in data/coco/. It should be named dataset.json.

Run the preprocessing to create the needed JSON files:

python generate_json_data.py

Start the training by running:

python train.py

The models will be saved in model/ and the training statistics will be saved in runs/. To see the training statistics, use:

tensorboard --logdir runs

To Generate Captions

python generate_caption.py --img-path <PATH_TO_IMG> --model <PATH_TO_MODEL_PARAMETERS>

Todo

  • Create image encoder class
  • Create decoder class
  • Create dataset loader
  • Write main function for training and validation
  • Implement attention model
  • Implement decoder feed forward function
  • Write training function
  • Write validation function
  • Add BLEU evaluation
  • Update code to use GPU only when available, otherwise use CPU
  • Add performance statistics
  • Allow encoder to use resnet-152 and densenet-161

Captioned Examples

Correctly Captioned Images

Correctly Captioned Image 1

Correctly Captioned Image 2

Incorrectly Captioned Images

Incorrectly Captioned Image 1

Incorrectly Captioned Image 2

References

Show, Attend and Tell

Original Theano Implementation

Neural Machine Translation By Jointly Learning to Align And Translate

Karpathy's Data splits

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].