All Projects → Moeinh77 → Image-Captioning-with-Beam-Search

Moeinh77 / Image-Captioning-with-Beam-Search

Licence: other
Generating image captions using Xception Network and Beam Search in Keras

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Image-Captioning-with-Beam-Search

captioning chainer
A fast implementation of Neural Image Caption by Chainer
Stars: ✭ 17 (-5.56%)
Mutual labels:  rnn, image-captioning, beam-search
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (+100%)
Mutual labels:  image-captioning, beam-search
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (+783.33%)
Mutual labels:  rnn, beam-search
Image Captioning
Image Captioning using InceptionV3 and beam search
Stars: ✭ 290 (+1511.11%)
Mutual labels:  image-captioning, beam-search
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (+66.67%)
Mutual labels:  rnn, image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (+100%)
Mutual labels:  rnn, image-captioning
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (+600%)
Mutual labels:  image-captioning, beam-search
Arnet
CVPR 2018 - Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present
Stars: ✭ 94 (+422.22%)
Mutual labels:  rnn, image-captioning
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+1250%)
Mutual labels:  rnn, image-captioning
Motor-Imagery-Tasks-Classification-using-EEG-data
Implementation of Deep Neural Networks in Keras and Tensorflow to classify motor imagery tasks using EEG data
Stars: ✭ 67 (+272.22%)
Mutual labels:  rnn
tf-attend-infer-repeat
TensorFlow-based implementation of "Attend, Infer, Repeat" paper (Eslami et al., 2016, arXiv:1603.08575).
Stars: ✭ 44 (+144.44%)
Mutual labels:  rnn
rnn-theano
RNN(LSTM, GRU) in Theano with mini-batch training; character-level language models in Theano
Stars: ✭ 68 (+277.78%)
Mutual labels:  rnn
TCN-TF
TensorFlow Implementation of TCN (Temporal Convolutional Networks)
Stars: ✭ 107 (+494.44%)
Mutual labels:  rnn
MetaTraderForecast
RNN based Forecasting App for Meta Trader and similar trading platforms
Stars: ✭ 103 (+472.22%)
Mutual labels:  rnn
STAR Network
[PAMI 2021] Gating Revisited: Deep Multi-layer RNNs That Can Be Trained
Stars: ✭ 16 (-11.11%)
Mutual labels:  rnn
beam search
Beam search for neural network sequence to sequence (encoder-decoder) models.
Stars: ✭ 31 (+72.22%)
Mutual labels:  beam-search
Market-Trend-Prediction
This is a project of build knowledge graph course. The project leverages historical stock price, and integrates social media listening from customers to predict market Trend On Dow Jones Industrial Average (DJIA).
Stars: ✭ 57 (+216.67%)
Mutual labels:  rnn
presidential-rnn
Project 4 for Metis bootcamp. Objective was generation of character-level RNN trained on Donald Trump's statements using Keras. Also generated Markov chains, and quick pyTorch RNN as baseline. Attempted semi-supervised GAN, but was unable to test in time.
Stars: ✭ 26 (+44.44%)
Mutual labels:  rnn
text-rnn-tensorflow
Tutorial: Multi-layer Recurrent Neural Networks (LSTM, RNN) for text models in Python using TensorFlow.
Stars: ✭ 22 (+22.22%)
Mutual labels:  rnn
Clockwork-RNN
This repository is a reproduction of the clockwork RNN paper.
Stars: ✭ 20 (+11.11%)
Mutual labels:  rnn

Generating image captions using Xception Network and Beam Search

view notebook on nbviewer:link

Dataset:

The dataset consist of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations. The images are divided into train set (6000 images), validation set(1000 images), and test set (1000 images).

You can download the data from here: https://academictorrents.com/details/9dea07ba660a722ae1008c4c8afdd303b6f6e53b or here: https://github.com/jbrownlee/Datasets/releases

Model:

I utilized Encoder-Decoder architecture for the task. The Encoder network is a pre-trained Xception without the last two fully connected layers and it operates as a feature extractor. The Decoder network consists of two layers of GRU units with 256d hidden state. For regularization purposes, I used dropout with a rate of 0.4 between two GRU layers. Extracted features by the Encoder are 2048-d vectors for each image and they are fed to the Decoder alongside the input and also as the hidden state of the first GRU cell in the decoder.

Metric:

I used BLEU metric and it is calculated by comparing n-grams of the candidate with the n-grams of the reference translation and count the number of matches. These matches are position-independent. The more the matches, the better the candidate translation is.

Results:

The model reaches the BLEU accuracy of 61% for uni-grams on the test set, you can improve the score by training the model for longer durations or using a more sophisticated RNN with more layers. If you look at the examples below, you observe that the model is pretty good at recognizing the actions but makes some mistakes at recognizing the colors. Beam search with beams of 1,3, and 5 have been tested. Also, I tried using sum of the log of probabilities in beam search and result improved a little bit for some samples of the test set as shown below. I have included the weight of the trained model, feel free to use the trained model in your own projects.

More on image captioning:

I have recorded a Farsi tutorial explaining this code. You can find it here: http://sariab.ir/Home/Roadmap/8

📓 Show and Tell: A Neural Image Caption Generator

📓 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

📓 Automated Image Captioning with ConvNets and Recurrent Nets

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].