Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → anuragmishracse → Caption_generator

anuragmishracse / Caption_generator

Licence: mit

A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow keras image cnn lstm rnn image-captioning

Projects that are alternatives of or similar to Caption generator

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+1220.58%)

Mutual labels: cnn, lstm, rnn

Neural Networks

All about Neural Networks!

Stars: ✭ 34 (-86.01%)

Mutual labels: cnn, lstm, rnn

Deep Music Genre Classification

🎵 Using Deep Learning to Categorize Music as Time Progresses Through Spectrogram Analysis

Stars: ✭ 23 (-90.53%)

Mutual labels: cnn, lstm, rnn

Basicocr

BasicOCR是一个致力于解决自然场景文字识别算法研究的项目。该项目由长城数字大数据应用技术研究院佟派AI团队发起和维护。

Stars: ✭ 336 (+38.27%)

Mutual labels: cnn, lstm, rnn

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (-60.49%)

Mutual labels: cnn, lstm, rnn

Video Classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101

Stars: ✭ 543 (+123.46%)

Mutual labels: cnn, lstm, rnn

Rnn Theano

使用Theano实现的一些RNN代码，包括最基本的RNN，LSTM，以及部分Attention模型，如论文MLSTM等

Stars: ✭ 31 (-87.24%)

Mutual labels: cnn, lstm, rnn

CS231n

My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition

Stars: ✭ 30 (-87.65%)

Mutual labels: lstm, rnn, image-captioning

Sign Language Gesture Recognition

Sign Language Gesture Recognition From Video Sequences Using RNN And CNN

Stars: ✭ 214 (-11.93%)

Mutual labels: cnn, lstm, rnn

Cnn lstm for text classify

CNN, LSTM, NBOW, fasttext 中文文本分类

Stars: ✭ 90 (-62.96%)

Mutual labels: cnn, lstm, rnn

Unet Zoo

A collection of UNet and hybrid architectures in PyTorch for 2D and 3D Biomedical Image segmentation

Stars: ✭ 302 (+24.28%)

Mutual labels: cnn, lstm, rnn

Natural Language Processing With Tensorflow

Natural Language Processing with TensorFlow, published by Packt

Stars: ✭ 222 (-8.64%)

Mutual labels: cnn, lstm, rnn

Image Captioning

Image Captioning using InceptionV3 and beam search

Stars: ✭ 290 (+19.34%)

Mutual labels: cnn, lstm, image-captioning

Multi Class Text Classification Cnn Rnn

Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.

Stars: ✭ 570 (+134.57%)

Mutual labels: cnn, lstm, rnn

stylenet

A pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"

Stars: ✭ 58 (-76.13%)

Mutual labels: cnn, lstm, image-captioning

Neural Image Captioning

Implementation of Neural Image Captioning model using Keras with Theano backend

Stars: ✭ 12 (-95.06%)

Mutual labels: cnn, lstm, image-captioning

Lightnet

Efficient, transparent deep learning in hundreds of lines of code.

Stars: ✭ 243 (+0%)

Mutual labels: cnn, lstm, rnn

Automatic speech recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Stars: ✭ 2,751 (+1032.1%)

Mutual labels: cnn, lstm, rnn

Deepseqslam

The Official Deep Learning Framework for Route-based Place Recognition

Stars: ✭ 49 (-79.84%)

Mutual labels: cnn, lstm, rnn

Pytorch Learners Tutorial

PyTorch tutorial for learners

Stars: ✭ 97 (-60.08%)

Mutual labels: cnn, lstm, rnn

View All Similar Projects ➔

caption_generator: An image captioning project

Note: This project is no longer under active development. However, queries and pull requests will be responded to. Thanks!

To generate a caption for any image in natural language, English. The architecture for the model is inspired from [1] by Vinyals et al. The module is built using keras, the deep learning library.

This repository serves two purposes:

present/ discuss my model and results I obtained
provide a simple architecture for image captioning to the community

Model

The Image captioning model has been implemented using the Sequential API of keras. It consists of three components:

An encoder CNN model: A pre-trained CNN is used to encode an image to its features. In this implementation VGG16 model[d] is used as encoder and with its pretrained weights loaded. The last softmax layer of VGG16 is removed and the vector of dimention (4096,) is obtained from the second last layer.

To speed up my training, I pre-encoded each image to its feature set. This is done in the prepare_dataset.py file to form a resultant pickle file encoded_images.p. In the current version, the image model takes the (4096,) dimension encoded image vector as input. This can be overrided by uncommenting the VGG model lines in caption_generator.py. There is no fine tuning in the current version but can be implemented.
A word embedding model: Since the number of unique words can be large, a one hot encoding of the words is not a good idea. An embedding model is trained that takes a word and outputs an embedding vector of dimension (1, 128).

Pre-trained word embeddings can also be used.
A decoder RNN model: A LSTM network has been employed for the task of generating captions. It takes the image vector and partial captions at the current timestep and input and generated the next most probable word as output.

The overall architecture of the model is described by the following picture. It also shows the input and output dimension of each layer in the model.

Dataset

The model has been trained and tested on Flickr8k dataset[2]. There are many other datasets available that can used as well like:

Flickr30k
MS COCO
SBU
Pascal

Experiments and results

The model has been trained for 50 epochs which lowers down the loss to 2.6465. With a larger dataset, it might be needed to run the model for atleast 50 more epochs.

With the current training on the Flickr8k dataset, running test on the 1000 test images results in, BLEU = ~0.57.

Some captions generated are as follows:

Requirements

tensorflow
keras
numpy
h5py
pandas
Pillow

These requirements can be easily installed by: pip install -r requirements.txt

Scripts

caption_generator.py: The base script that contains functions for model creation, batch data generator etc.
prepare_dataset.py: Prepares the dataset for training. Changes have to be done to this script if new dataset is to be used.
train_model.py: Module for training the caption generator.
test_model.py: Contains module for testing the performance of the caption generator, currently it contains the (BLEU)[https://en.wikipedia.org/wiki/BLEU] metric. New metrics can be added.

Usage

After the requirements have been installed, the process from training to testing is fairly easy. The commands to run:

python prepare_dataset.py
python train_model.py
python test_model.py

References

[1] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. Show and Tell: A Neural Image Caption Generator

[2] Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. Collecting Image Annotations Using Amazon's Mechanical Turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.

Acknowledgements

[a] I am thankful to my project guide Prof. NK Bansode and a big shoutout to my project teammates. We have also developed an implementation of [1] in TensorFlow available at image-caption-generator which had been trained and tested on MS COCO dataset.

[b] Special thanks to Ashwanth Kumar for helping me with the resources and effort to train my models.

[c] Keras: Deep Learning library for Theano and TensorFlow: Thanks to François Chollet for developing and maintaining such a wonderful library.

[d] deep-learning-models: Thanks to François Chollet for providing pretrained VGG16 model and weights.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 243

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (28) 🔗