Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dabasajay → Image Caption Generator

dabasajay / Image Caption Generator

Licence: mit

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning convolutional-neural-networks lstm recurrent-neural-networks attention-mechanism attention image-captioning attention-model cnn-keras vgg16 beam-search

Projects that are alternatives of or similar to Image Caption Generator

automatic-personality-prediction

[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings

Stars: ✭ 43 (-65.87%)

Mutual labels: recurrent-neural-networks, lstm, attention, attention-mechanism, cnn-keras

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-84.13%)

Mutual labels: recurrent-neural-networks, lstm, attention, attention-mechanism

Image Captioning

Image Captioning: Implementing the Neural Image Caption Generator with python

Stars: ✭ 52 (-58.73%)

Mutual labels: convolutional-neural-networks, lstm, recurrent-neural-networks, image-captioning

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (-5.56%)

Mutual labels: lstm, recurrent-neural-networks, attention-mechanism, attention-model

Image Caption Generator

[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow

Stars: ✭ 141 (+11.9%)

Mutual labels: convolutional-neural-networks, lstm, recurrent-neural-networks, image-captioning

Image-Caption

Using LSTM or Transformer to solve Image Captioning in Pytorch

Stars: ✭ 36 (-71.43%)

Mutual labels: image-captioning, beam-search, attention-mechanism

Image Captioning

Image Captioning using InceptionV3 and beam search

Stars: ✭ 290 (+130.16%)

Mutual labels: beam-search, lstm, image-captioning

Personality Detection

Implementation of a hierarchical CNN based model to detect Big Five personality traits

Stars: ✭ 338 (+168.25%)

Mutual labels: convolutional-neural-networks, lstm, cnn-keras

Deep learning nlp

Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP

Stars: ✭ 407 (+223.02%)

Mutual labels: recurrent-neural-networks, attention, cnn-keras

Thesemicolon

This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.

Stars: ✭ 345 (+173.81%)

Mutual labels: convolutional-neural-networks, lstm, cnn-keras

Structured Self Attention

A Structured Self-attentive Sentence Embedding

Stars: ✭ 459 (+264.29%)

Mutual labels: attention-mechanism, attention, attention-model

keras-deep-learning

Various implementations and projects on CNN, RNN, LSTM, GAN, etc

Stars: ✭ 22 (-82.54%)

Mutual labels: lstm, attention-mechanism, cnn-keras

ntua-slp-semeval2018

Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.

Stars: ✭ 79 (-37.3%)

Mutual labels: lstm, attention, attention-mechanism

Keras Anomaly Detection

Anomaly detection implemented in Keras

Stars: ✭ 335 (+165.87%)

Mutual labels: convolutional-neural-networks, lstm, recurrent-neural-networks

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-70.63%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks, attention-model

Textclassifier

Text classifier for Hierarchical Attention Networks for Document Classification

Stars: ✭ 985 (+681.75%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks, attention-mechanism

Text Classification Models Pytorch

Implementation of State-of-the-art Text Classification Models in Pytorch

Stars: ✭ 379 (+200.79%)

Mutual labels: convolutional-neural-networks, recurrent-neural-networks, attention

learningspoons

nlp lecture-notes and source code

Stars: ✭ 29 (-76.98%)

Mutual labels: lstm, attention, attention-model

Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses

Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow

Stars: ✭ 33 (-73.81%)

Mutual labels: lstm, attention, attention-mechanism

Neural Image Captioning

Implementation of Neural Image Captioning model using Keras with Theano backend

Stars: ✭ 12 (-90.48%)

Mutual labels: lstm, image-captioning, vgg16

View All Similar Projects ➔

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Examples

Image Credits : Towardsdatascience

Requirements
Training parameters and results
Generated Captions on Test Images
Procedure to Train Model
Procedure to Test on new images
Configurations (config.py)
Frequently encountered problems
TODO
References

1. Requirements

Recommended System Requirements to train model.

A good CPU and a GPU with atleast 8GB memory
Atleast 8GB of RAM
Active internet connection so that keras can download inceptionv3/vgg16 model weights

Required libraries for Python along with their version numbers used while making & testing of this project

Python - 3.6.7
Numpy - 1.16.4
Tensorflow - 1.13.1
Keras - 2.2.4
nltk - 3.2.5
PIL - 4.3.0
Matplotlib - 3.0.3
tqdm - 4.28.1

Flickr8k Dataset: Dataset Request Form

UPDATE (April/2019): The official site seems to have been taken down (although the form still works). Here are some direct download links:

Flickr8k_Dataset
Flickr8k_text

Jason Brownlee

Important: After downloading the dataset, put the reqired files in train_val_data folder

2. Training parameters and results

NOTE

batch_size=64 took ~14GB GPU memory in case of InceptionV3 + AlternativeRNN and VGG16 + AlternativeRNN
batch_size=64 took ~8GB GPU memory in case of InceptionV3 + RNN and VGG16 + RNN
If you're low on memory, use google colab or reduce batch size
In case of BEAM Search, loss and val_loss are same as in case of argmax since the model is same

Model & Config	Argmax	BEAM Search
InceptionV3 + AlternativeRNN Epochs = 20 Batch Size = 64 Optimizer = Adam	Crossentropy loss (Lower the better) loss(train_loss): 2.4050 val_loss: 3.0527 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.596818 BLEU-2: 0.356009 BLEU-3: 0.252489 BLEU-4: 0.129536	k = 3 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.606086 BLEU-2: 0.359171 BLEU-3: 0.249124 BLEU-4: 0.126599
InceptionV3 + RNN Epochs = 11 Batch Size = 64 Optimizer = Adam	Crossentropy loss (Lower the better) loss(train_loss): 2.5254 val_loss: 3.1769 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.601791 BLEU-2: 0.344289 BLEU-3: 0.230025 BLEU-4: 0.108898	k = 3 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.605097 BLEU-2: 0.356094 BLEU-3: 0.251132 BLEU-4: 0.129900
VGG16 + AlternativeRNN Epochs = 18 Batch Size = 64 Optimizer = Adam	Crossentropy loss (Lower the better) loss(train_loss): 2.2880 val_loss: 3.1889 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.596655 BLEU-2: 0.342127 BLEU-3: 0.229676 BLEU-4: 0.108707	k = 3 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.593876 BLEU-2: 0.348569 BLEU-3: 0.242063 BLEU-4: 0.123221
VGG16 + RNN Epochs = 7 Batch Size = 64 Optimizer = Adam	Crossentropy loss (Lower the better) loss(train_loss): 2.6297 val_loss: 3.3486 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.557626 BLEU-2: 0.317652 BLEU-3: 0.216636 BLEU-4: 0.105288	k = 3 BLEU Scores on Validation data (Higher the better) BLEU-1: 0.568993 BLEU-2: 0.326569 BLEU-3: 0.226629 BLEU-4: 0.113102

3. Generated Captions on Test Images

Model used - InceptionV3 + AlternativeRNN

Image	Caption
	Argmax: A man in a blue shirt is riding a bike on a dirt path. BEAM Search, k=3: A man is riding a bicycle on a dirt path.
	Argmax: A man in a red kayak is riding down a waterfall. BEAM Search, k=3: A man on a surfboard is riding a wave.

4. Procedure to Train Model

Clone the repository to preserve directory structure.
git clone https://github.com/dabasajay/Image-Caption-Generator.git
Put the required dataset files in train_val_data folder (files mentioned in readme there).
Review config.py for paths and other configurations (explained below).
Run train_val.py.

5. Procedure to Test on new images

Clone the repository to preserve directory structure.
git clone https://github.com/dabasajay/Image-Caption-Generator.git
Train the model to generate required files in model_data folder (steps given above).
Put the test images in test_data folder.
Review config.py for paths and other configurations (explained below).
Run test.py.

6. Configurations (config.py)

config

images_path :- Folder path containing flickr dataset images
train_data_path :- .txt file path containing images ids for training
val_data_path :- .txt file path containing imgage ids for validation
captions_path :- .txt file path containing captions
tokenizer_path :- path for saving tokenizer
model_data_path :- path for saving files related to model
model_load_path :- path for loading trained model
num_of_epochs :- Number of epochs
max_length :- Maximum length of captions. This is set manually after training of model and required for test.py
batch_size :- Batch size for training (larger will consume more GPU & CPU memory)
beam_search_k :- BEAM search parameter which tells the algorithm how many words to consider at a time.
test_data_path :- Folder path containing images for testing/inference
model_type :- CNN Model type to use -> inceptionv3 or vgg16
random_seed :- Random seed for reproducibility of results

rnnConfig

embedding_size :- Embedding size used in Decoder(RNN) Model
LSTM_units :- Number of LSTM units in Decoder(RNN) Model
dense_units :- Number of Dense units in Decoder(RNN) Model
dropout :- Dropout probability used in Dropout layer in Decoder(RNN) Model

7. Frequently encountered problems

Out of memory issue:
- Try reducing batch_size
Results differ everytime I run script:
- Due to stochastic nature of these algoritms, results may differ slightly everytime. Even though I did set random seed to make results reproducible, results may differ slightly.
Results aren't very great using beam search compared to argmax:
- Try higher k in BEAM search using beam_search_k parameter in config. Note that higher k will improve results but it'll also increase inference time significantly.

8. TODO

[X] Support for VGG16 Model. Uses InceptionV3 Model by default.
[X] Implement 2 architectures of RNN Model.
[X] Support for batch processing in data generator with shuffling.
[X] Implement BEAM Search.
[X] Calculate BLEU Scores using BEAM Search.
[ ] Implement Attention and change model architecture.
[ ] Support for pre-trained word vectors like word2vec, GloVe etc.

9. References

Show and Tell: A Neural Image Caption Generator - Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Where to put the Image in an Image Caption Generator - Marc Tanti, Albert Gatt, Kenneth P. Camilleri
How to Develop a Deep Learning Photo Caption Generator from Scratch

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 126

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

dabasajay / Image Caption Generator

Programming Languages

Labels

Projects that are alternatives of or similar to Image Caption Generator

Image Caption Generator

Table of Contents

1. Requirements

2. Training parameters and results

NOTE

3. Generated Captions on Test Images

4. Procedure to Train Model

5. Procedure to Test on new images

6. Configurations (config.py)

7. Frequently encountered problems

8. TODO

9. References