All Projects → ntrang086 → Image_captioning

ntrang086 / Image_captioning

Licence: mit
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Image captioning

Neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
Stars: ✭ 400 (+684.31%)
Mutual labels:  image-captioning, encoder-decoder
Screenshot To Code
A neural network that transforms a design mock-up into a static website.
Stars: ✭ 13,561 (+26490.2%)
Mutual labels:  cnn, encoder-decoder
A Pytorch Tutorial To Image Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Stars: ✭ 1,867 (+3560.78%)
Mutual labels:  image-captioning, encoder-decoder
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-29.41%)
Mutual labels:  image-captioning, encoder-decoder
stylenet
A pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"
Stars: ✭ 58 (+13.73%)
Mutual labels:  cnn, image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-29.41%)
Mutual labels:  image-captioning, encoder-decoder
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+376.47%)
Mutual labels:  cnn, image-captioning
Image Captioning
Image Captioning using InceptionV3 and beam search
Stars: ✭ 290 (+468.63%)
Mutual labels:  cnn, image-captioning
Neural Image Captioning
Implementation of Neural Image Captioning model using Keras with Theano backend
Stars: ✭ 12 (-76.47%)
Mutual labels:  cnn, image-captioning
Yann
This toolbox is support material for the book on CNN (http://www.convolution.network).
Stars: ✭ 41 (-19.61%)
Mutual labels:  cnn
Dvdnet
DVDnet: A Simple and Fast Network for Deep Video Denoising
Stars: ✭ 47 (-7.84%)
Mutual labels:  cnn
Dialectid e2e
End to End Dialect Identification using Convolutional Neural Network
Stars: ✭ 40 (-21.57%)
Mutual labels:  cnn
Neural Style
本代码加入了神经风格迁移中文注释,更易理解,同时新加入begin.py可直接运行调试。
Stars: ✭ 42 (-17.65%)
Mutual labels:  cnn
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-7.84%)
Mutual labels:  cnn
Monodepth360
Master's project implementing depth estimation for spherical images using unsupervised learning with CNNs.
Stars: ✭ 41 (-19.61%)
Mutual labels:  cnn
Tensorflow Cnn Time Series
Feeding images of time series to Conv Nets! (Tensorflow + Keras)
Stars: ✭ 49 (-3.92%)
Mutual labels:  cnn
Keras basic
keras를 이용한 딥러닝 기초 학습
Stars: ✭ 39 (-23.53%)
Mutual labels:  cnn
Qanet
A Tensorflow implementation of QANet for machine reading comprehension
Stars: ✭ 996 (+1852.94%)
Mutual labels:  cnn
Jacinto Ai Devkit
Training & Quantization of embedded friendly Deep Learning / Machine Learning / Computer Vision models
Stars: ✭ 49 (-3.92%)
Mutual labels:  cnn
Simple Sign Language Detector
Simple Sign Language Detector
Stars: ✭ 49 (-3.92%)
Mutual labels:  cnn

Image Captioning

Introduction

Build a model to generate captions from images. When given an image, the model is able to describe in English what is in the image. In order to achieve this, our model is comprised of an encoder which is a CNN and a decoder which is an RNN. The CNN encoder is given images for a classification task and its output is fed into the RNN decoder which outputs English sentences.

The model and the tuning of its hyperparamaters are based on ideas presented in the paper Show and Tell: A Neural Image Caption Generator and Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.

We use the Microsoft Common Objects in COntext (MS COCO) dataset for this project. It is a large-scale dataset for scene understanding. The dataset is commonly used to train and benchmark object detection, segmentation, and captioning algorithms. For instructions on downloading the data, see the Data section below.

Code

The code can be categorized into two groups:

  1. Notebooks - The main code for the project is structured as a series of Jupyter notebooks:
  • 0_Dataset.ipynb - Introduces the dataset and plots some sample images.
  • 1_Preliminaries.ipynb - Loads and pre-processes data and experiments with models.
  • 2_Training.ipynb - Trains a CNN-RNN model.
  • 3_Inference.ipynb- Generates captions for test images.
  1. Helper files - Contain helper code for the notebooks:
  • data_loader.py- Creates the CoCoDataset and a DataLoader for it.
  • vocabulary.py - Tokenizes captions and adds them to a dictionary of vocabulary. It is used as an instance variable of the CoCoDataset.
  • model.py - Provides the CNN and RNN models that are used by the notebooks to train and test data.

Setup

  1. Clone the COCO API repo into this project's directory:
git clone https://github.com/cocodataset/cocoapi.git
  1. Setup COCO API (also described in the readme here):
cd cocoapi/PythonAPI
make
cd ..
  1. Install PyTorch (4.0 recommended) and torchvision.

    • Linux or Mac:
    conda install pytorch torchvision -c pytorch 
    
    • Windows:
    conda install -c peterjc123 pytorch-cpu
    pip install torchvision
    
  2. Others:

  • Python 3
  • pycocotools
  • nltk
  • numpy
  • scikit-image
  • matplotlib
  • tqdm

Data

Download the following data from the COCO website, and place them, as instructed below, into the cocoapi subdirectory located inside this project's directory (the subdirectory was created when cloning the COCO API repo as shown in the Setup section above):

  • under Annotations, download:
    • 2014 Train/Val annotations [241MB] (extract captions_train2014.json, captions_val2014.json, instances_train2014.json and instances_val2014.json, and place them in the subdirectory cocoapi/annotations/)
    • 2014 Testing Image info [1MB] (extract image_info_test2014.json and place it in the subdirectory cocoapi/annotations/)
  • under Images, download:
    • 2014 Train images [83K/13GB] (extract the train2014 folder and place it in the subdirectory cocoapi/images/)
    • 2014 Val images [41K/6GB] (extract the val2014 folder and place it in the subdirectory cocoapi/images/)
    • 2014 Test images [41K/6GB] (extract the test2014 folder and place it in the subdirectory cocoapi/images/)

Run

To run any script file, use:

python <script.py>

To run any IPython Notebook, use:

jupyter notebook <notebook_name.ipynb>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].