All Projects → kakshak07 → Image-Captioining

kakshak07 / Image-Captioining

Licence: MIT license
The objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Image-Captioining

Image Captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Stars: ✭ 171 (+755%)
Mutual labels:  image-captioning
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+1110%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (+185%)
Mutual labels:  image-captioning
Up Down Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Stars: ✭ 195 (+875%)
Mutual labels:  image-captioning
Meshed Memory Transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Stars: ✭ 230 (+1050%)
Mutual labels:  image-captioning
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (+140%)
Mutual labels:  image-captioning
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (+605%)
Mutual labels:  image-captioning
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (+270%)
Mutual labels:  image-captioning
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+1115%)
Mutual labels:  image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (+80%)
Mutual labels:  image-captioning
Sca Cnn.cvpr17
Image Captions Generation with Spatial and Channel-wise Attention
Stars: ✭ 198 (+890%)
Mutual labels:  image-captioning
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (+900%)
Mutual labels:  image-captioning
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-10%)
Mutual labels:  image-captioning
Fairseq Image Captioning
Transformer-based image captioning extension for pytorch/fairseq
Stars: ✭ 180 (+800%)
Mutual labels:  image-captioning
BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (+40%)
Mutual labels:  image-captioning
Show Adapt And Tell
Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017
Stars: ✭ 146 (+630%)
Mutual labels:  image-captioning
Show Control And Tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Stars: ✭ 243 (+1115%)
Mutual labels:  image-captioning
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (+20%)
Mutual labels:  image-captioning
LaBERT
A length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (+150%)
Mutual labels:  image-captioning
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+930%)
Mutual labels:  image-captioning

Image Captioning

Image captioning is describing an image fed to the model. The task of object detection has been studied for a long time but recently the task of image captioning is coming into light. This repository contains the "Neural Image Caption" model proposed by Vinyals et. al.[1]

Dataset

The dataset used is flickr8k. You can request the data here. An email for the links of the data to be downloaded will be mailed to your id. Extract the images in Flickr8K_Data and the text data in Flickr8K_Text.

Requirements

  1. Tensorflow
  2. Keras
  3. Numpy
  4. h5py
  5. Pandas
  6. Pillow
  7. Pyttsx

Steps to execute

  1. After extracting the data, execute the preprocess_data.py file by locating the file directory and execute "python preprocess_data.py". This file adds "start " and " end" token to the training and testing text data. On execution the file creates new txt files in Flickr8K_Text folder.

  2. Execute the encode_image.py file by typing "python encode_image.py" in the terminal window of the file directory. This creates image_encodings.p which generates image encodings by feeding the image to VGG16 model. In case the weights are not directly available in your temp directory, the weights will be downloaded first.

  3. Execute the train.py file in terminal window as "python train.py (int)". Replace "(int)" by any integer value. The variable will denote the number of epochs for which the model will be trained. The models will be saved in the Output folder in this directory.

  4. After training execute "python test.py image" for generating a caption of an image. Pass the extension of the image along with the name of the image file for example, "python test.py beach.jpg". The image file must be present in the test folder.

NOTE - You can skip the training part by directly downloading the weights and model file and placing them in the Output folder since the training part wil take a lot of time if working on a non-GPU system. A GTX 1050 Ti with 4 gigs of RAM takes around 10-15 minutes for one epoch.

Output

The output of the model is a caption to the image and a python library called pyttsx which converts the generated text to audio

Results

Following are a few results obtained after training the model for 70 epochs.

Image Caption
Generated Caption: A brown dog is running in the water.
Generated Caption: A tennis player hitting the ball.
Generated Caption: A child in a helmet is riding a bike.
Generated Caption: A group of people are walking on a busy street.

On providing an ambiguous image for example a hamsters face morphed on a lion the model got confused but since the data is a bit biased towards dogs hence it captions it as a dog and the reddish pink nose of the hamster is identified as red ball

Image Caption
Generated Caption: A black dog is running through the snow with a red ball.

In some cases the classifier got confused and on blurring an image it produced bizzare results

Image Caption
Generated Caption: A brown dog and a brown dog are playing with a ball in the snow.
Generated Caption: A little girl in a white shirt is running on the grass.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].