All Projects → dabasajay → Image Caption Generator

dabasajay / Image Caption Generator

Licence: mit
A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Image Caption Generator

automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (-65.87%)
Mutual labels:  recurrent-neural-networks, lstm, attention, attention-mechanism, cnn-keras
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-84.13%)
Mutual labels:  recurrent-neural-networks, lstm, attention, attention-mechanism
Image Captioning
Image Captioning: Implementing the Neural Image Caption Generator with python
Stars: ✭ 52 (-58.73%)
Mutual labels:  convolutional-neural-networks, lstm, recurrent-neural-networks, image-captioning
Linear Attention Recurrent Neural Network
A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)
Stars: ✭ 119 (-5.56%)
Mutual labels:  lstm, recurrent-neural-networks, attention-mechanism, attention-model
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (+11.9%)
Mutual labels:  convolutional-neural-networks, lstm, recurrent-neural-networks, image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-71.43%)
Mutual labels:  image-captioning, beam-search, attention-mechanism
Image Captioning
Image Captioning using InceptionV3 and beam search
Stars: ✭ 290 (+130.16%)
Mutual labels:  beam-search, lstm, image-captioning
Personality Detection
Implementation of a hierarchical CNN based model to detect Big Five personality traits
Stars: ✭ 338 (+168.25%)
Mutual labels:  convolutional-neural-networks, lstm, cnn-keras
Deep learning nlp
Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP
Stars: ✭ 407 (+223.02%)
Mutual labels:  recurrent-neural-networks, attention, cnn-keras
Thesemicolon
This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.
Stars: ✭ 345 (+173.81%)
Mutual labels:  convolutional-neural-networks, lstm, cnn-keras
Structured Self Attention
A Structured Self-attentive Sentence Embedding
Stars: ✭ 459 (+264.29%)
Mutual labels:  attention-mechanism, attention, attention-model
keras-deep-learning
Various implementations and projects on CNN, RNN, LSTM, GAN, etc
Stars: ✭ 22 (-82.54%)
Mutual labels:  lstm, attention-mechanism, cnn-keras
ntua-slp-semeval2018
Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Stars: ✭ 79 (-37.3%)
Mutual labels:  lstm, attention, attention-mechanism
Keras Anomaly Detection
Anomaly detection implemented in Keras
Stars: ✭ 335 (+165.87%)
Mutual labels:  convolutional-neural-networks, lstm, recurrent-neural-networks
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-70.63%)
Mutual labels:  convolutional-neural-networks, recurrent-neural-networks, attention-model
Textclassifier
Text classifier for Hierarchical Attention Networks for Document Classification
Stars: ✭ 985 (+681.75%)
Mutual labels:  convolutional-neural-networks, recurrent-neural-networks, attention-mechanism
Text Classification Models Pytorch
Implementation of State-of-the-art Text Classification Models in Pytorch
Stars: ✭ 379 (+200.79%)
Mutual labels:  convolutional-neural-networks, recurrent-neural-networks, attention
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (-76.98%)
Mutual labels:  lstm, attention, attention-model
Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses
Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow
Stars: ✭ 33 (-73.81%)
Mutual labels:  lstm, attention, attention-mechanism
Neural Image Captioning
Implementation of Neural Image Captioning model using Keras with Theano backend
Stars: ✭ 12 (-90.48%)
Mutual labels:  lstm, image-captioning, vgg16

Image Caption Generator

Issues Forks Stars Ajay Dabas

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Examples

Example of Image Captioning

Image Credits : Towardsdatascience

Table of Contents

  1. Requirements
  2. Training parameters and results
  3. Generated Captions on Test Images
  4. Procedure to Train Model
  5. Procedure to Test on new images
  6. Configurations (config.py)
  7. Frequently encountered problems
  8. TODO
  9. References

1. Requirements

Recommended System Requirements to train model.

  • A good CPU and a GPU with atleast 8GB memory
  • Atleast 8GB of RAM
  • Active internet connection so that keras can download inceptionv3/vgg16 model weights

Required libraries for Python along with their version numbers used while making & testing of this project

  • Python - 3.6.7
  • Numpy - 1.16.4
  • Tensorflow - 1.13.1
  • Keras - 2.2.4
  • nltk - 3.2.5
  • PIL - 4.3.0
  • Matplotlib - 3.0.3
  • tqdm - 4.28.1

Flickr8k Dataset: Dataset Request Form

UPDATE (April/2019): The official site seems to have been taken down (although the form still works). Here are some direct download links:

Important: After downloading the dataset, put the reqired files in train_val_data folder

2. Training parameters and results

NOTE

  • batch_size=64 took ~14GB GPU memory in case of InceptionV3 + AlternativeRNN and VGG16 + AlternativeRNN
  • batch_size=64 took ~8GB GPU memory in case of InceptionV3 + RNN and VGG16 + RNN
  • If you're low on memory, use google colab or reduce batch size
  • In case of BEAM Search, loss and val_loss are same as in case of argmax since the model is same
Model & Config Argmax BEAM Search
InceptionV3 + AlternativeRNN
  • Epochs = 20
  • Batch Size = 64
  • Optimizer = Adam
    Crossentropy loss
    (Lower the better)
  • loss(train_loss): 2.4050
  • val_loss: 3.0527
  • BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.596818
  • BLEU-2: 0.356009
  • BLEU-3: 0.252489
  • BLEU-4: 0.129536
    k = 3

    BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.606086
  • BLEU-2: 0.359171
  • BLEU-3: 0.249124
  • BLEU-4: 0.126599
InceptionV3 + RNN
  • Epochs = 11
  • Batch Size = 64
  • Optimizer = Adam
    Crossentropy loss
    (Lower the better)
  • loss(train_loss): 2.5254
  • val_loss: 3.1769
  • BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.601791
  • BLEU-2: 0.344289
  • BLEU-3: 0.230025
  • BLEU-4: 0.108898
    k = 3

    BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.605097
  • BLEU-2: 0.356094
  • BLEU-3: 0.251132
  • BLEU-4: 0.129900
VGG16 + AlternativeRNN
  • Epochs = 18
  • Batch Size = 64
  • Optimizer = Adam
    Crossentropy loss
    (Lower the better)
  • loss(train_loss): 2.2880
  • val_loss: 3.1889
  • BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.596655
  • BLEU-2: 0.342127
  • BLEU-3: 0.229676
  • BLEU-4: 0.108707
    k = 3

    BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.593876
  • BLEU-2: 0.348569
  • BLEU-3: 0.242063
  • BLEU-4: 0.123221
VGG16 + RNN
  • Epochs = 7
  • Batch Size = 64
  • Optimizer = Adam
    Crossentropy loss
    (Lower the better)
  • loss(train_loss): 2.6297
  • val_loss: 3.3486
  • BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.557626
  • BLEU-2: 0.317652
  • BLEU-3: 0.216636
  • BLEU-4: 0.105288
    k = 3

    BLEU Scores on Validation data
    (Higher the better)
  • BLEU-1: 0.568993
  • BLEU-2: 0.326569
  • BLEU-3: 0.226629
  • BLEU-4: 0.113102

3. Generated Captions on Test Images

Model used - InceptionV3 + AlternativeRNN

Image Caption
Image 1
  • Argmax: A man in a blue shirt is riding a bike on a dirt path.
  • BEAM Search, k=3: A man is riding a bicycle on a dirt path.
Image 2
  • Argmax: A man in a red kayak is riding down a waterfall.
  • BEAM Search, k=3: A man on a surfboard is riding a wave.

4. Procedure to Train Model

  1. Clone the repository to preserve directory structure.
    git clone https://github.com/dabasajay/Image-Caption-Generator.git
  2. Put the required dataset files in train_val_data folder (files mentioned in readme there).
  3. Review config.py for paths and other configurations (explained below).
  4. Run train_val.py.

5. Procedure to Test on new images

  1. Clone the repository to preserve directory structure.
    git clone https://github.com/dabasajay/Image-Caption-Generator.git
  2. Train the model to generate required files in model_data folder (steps given above).
  3. Put the test images in test_data folder.
  4. Review config.py for paths and other configurations (explained below).
  5. Run test.py.

6. Configurations (config.py)

config

  1. images_path :- Folder path containing flickr dataset images
  2. train_data_path :- .txt file path containing images ids for training
  3. val_data_path :- .txt file path containing imgage ids for validation
  4. captions_path :- .txt file path containing captions
  5. tokenizer_path :- path for saving tokenizer
  6. model_data_path :- path for saving files related to model
  7. model_load_path :- path for loading trained model
  8. num_of_epochs :- Number of epochs
  9. max_length :- Maximum length of captions. This is set manually after training of model and required for test.py
  10. batch_size :- Batch size for training (larger will consume more GPU & CPU memory)
  11. beam_search_k :- BEAM search parameter which tells the algorithm how many words to consider at a time.
  12. test_data_path :- Folder path containing images for testing/inference
  13. model_type :- CNN Model type to use -> inceptionv3 or vgg16
  14. random_seed :- Random seed for reproducibility of results

rnnConfig

  1. embedding_size :- Embedding size used in Decoder(RNN) Model
  2. LSTM_units :- Number of LSTM units in Decoder(RNN) Model
  3. dense_units :- Number of Dense units in Decoder(RNN) Model
  4. dropout :- Dropout probability used in Dropout layer in Decoder(RNN) Model

7. Frequently encountered problems

  • Out of memory issue:
    • Try reducing batch_size
  • Results differ everytime I run script:
    • Due to stochastic nature of these algoritms, results may differ slightly everytime. Even though I did set random seed to make results reproducible, results may differ slightly.
  • Results aren't very great using beam search compared to argmax:
    • Try higher k in BEAM search using beam_search_k parameter in config. Note that higher k will improve results but it'll also increase inference time significantly.

8. TODO

  • [X] Support for VGG16 Model. Uses InceptionV3 Model by default.

  • [X] Implement 2 architectures of RNN Model.

  • [X] Support for batch processing in data generator with shuffling.

  • [X] Implement BEAM Search.

  • [X] Calculate BLEU Scores using BEAM Search.

  • [ ] Implement Attention and change model architecture.

  • [ ] Support for pre-trained word vectors like word2vec, GloVe etc.

9. References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].