All Projects → peteanderson80 → Up Down Captioner

peteanderson80 / Up Down Captioner

Licence: mit
Automatic image captioning model based on Caffe, using features from bottom-up attention.

Projects that are alternatives of or similar to Up Down Captioner

Neural Image Captioning
Implementation of Neural Image Captioning model using Keras with Theano backend
Stars: ✭ 12 (-93.85%)
Mutual labels:  jupyter-notebook, lstm, image-captioning
Image Captioning
Image Captioning using InceptionV3 and beam search
Stars: ✭ 290 (+48.72%)
Mutual labels:  jupyter-notebook, lstm, image-captioning
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (-27.69%)
Mutual labels:  jupyter-notebook, lstm, image-captioning
Bottom Up Attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Stars: ✭ 989 (+407.18%)
Mutual labels:  jupyter-notebook, caffe, image-captioning
Tensorflow Multi Dimensional Lstm
Multi dimensional LSTM as described in Alex Graves' Paper https://arxiv.org/pdf/0705.2011.pdf
Stars: ✭ 154 (-21.03%)
Mutual labels:  jupyter-notebook, lstm
Py Rfcn Priv
code for py-R-FCN-multiGPU maintained by bupt-priv
Stars: ✭ 153 (-21.54%)
Mutual labels:  jupyter-notebook, caffe
Amazon Product Recommender System
Sentiment analysis on Amazon Review Dataset available at http://snap.stanford.edu/data/web-Amazon.html
Stars: ✭ 158 (-18.97%)
Mutual labels:  jupyter-notebook, lstm
Deformable Convnets Caffe
Deformable Convolutional Networks on caffe
Stars: ✭ 166 (-14.87%)
Mutual labels:  jupyter-notebook, caffe
Ethnicolr
Predict Race and Ethnicity Based on the Sequence of Characters in a Name
Stars: ✭ 137 (-29.74%)
Mutual labels:  jupyter-notebook, lstm
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (-18.46%)
Mutual labels:  jupyter-notebook, lstm
Rnn For Human Activity Recognition Using 2d Pose Input
Activity Recognition from 2D pose using an LSTM RNN
Stars: ✭ 165 (-15.38%)
Mutual labels:  jupyter-notebook, lstm
Sphereface Plus
SphereFace+ Implementation for <Learning towards Minimum Hyperspherical Energy> in NIPS'18.
Stars: ✭ 151 (-22.56%)
Mutual labels:  jupyter-notebook, caffe
Stock Price Predictor
This project seeks to utilize Deep Learning models, Long-Short Term Memory (LSTM) Neural Network algorithm, to predict stock prices.
Stars: ✭ 146 (-25.13%)
Mutual labels:  jupyter-notebook, lstm
Tensorflow On Android For Human Activity Recognition With Lstms
iPython notebook and Android app that shows how to build LSTM model in TensorFlow and deploy it on Android
Stars: ✭ 157 (-19.49%)
Mutual labels:  jupyter-notebook, lstm
Rnn For Joint Nlu
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling" (https://arxiv.org/abs/1609.01454)
Stars: ✭ 176 (-9.74%)
Mutual labels:  jupyter-notebook, lstm
Load forecasting
Load forcasting on Delhi area electric power load using ARIMA, RNN, LSTM and GRU models
Stars: ✭ 160 (-17.95%)
Mutual labels:  jupyter-notebook, lstm
Deep Algotrading
A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading
Stars: ✭ 173 (-11.28%)
Mutual labels:  jupyter-notebook, lstm
Lstm anomaly thesis
Anomaly detection for temporal data using LSTMs
Stars: ✭ 178 (-8.72%)
Mutual labels:  jupyter-notebook, lstm
Stylenet
A cute multi-layer LSTM that can perform like a human 🎶
Stars: ✭ 187 (-4.1%)
Mutual labels:  jupyter-notebook, lstm
Handwriting Synthesis
Implementation of "Generating Sequences With Recurrent Neural Networks" https://arxiv.org/abs/1308.0850
Stars: ✭ 135 (-30.77%)
Mutual labels:  jupyter-notebook, lstm

Up-Down-Captioner

Simple yet high-performing image captioning model using Caffe and python. Using image features from bottom-up attention, in July 2017 this model achieved state-of-the-art performance on all metrics of the COCO captions test leaderboard (SPICE 21.5, CIDEr 117.9, BLEU_4 36.9). The architecture (2-layer LSTM with attention) is described in Section 3.2 of:

Reference

If you use this code in your research, please cite our paper:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

License

This code is released under the MIT License (refer to the LICENSE file for details).

Requirements: software

  1. Important Please use the version of caffe provided as a submodule within this repository. It contains additional layers and features required for captioning.

  2. Requirements for Caffe and pycaffe (see: Caffe installation instructions)

    Note: Caffe must be built with support for Python layers and NCCL!

    # In your Makefile.config, make sure to have these lines uncommented
    WITH_PYTHON_LAYER := 1
    USE_NCCL := 1
    # Unrelatedly, it's also recommended that you use CUDNN
    USE_CUDNN := 1
    
  3. Nvidia's NCCL library which is used for multi-GPU training https://github.com/NVIDIA/nccl

Requirements: hardware

By default, the provided training scripts assume that two gpus are available, with indices 0,1. Training on two gpus takes around 9 hours. Any NVIDIA GPU with 8GB or larger memory should be OK. Training scripts and prototxt files will require minor modifications to train on a single gpu (e.g. set iter_size to 2).

Demo - Using the model to predict on new images

Run install instructions 1-4 below, then use the notebook at scripts/demo.ipynb

Installation

All instructions are from the top level directory. To run the demo, should be only steps 1-4 required (remaining steps are for training a model).

  1. Clone the Up-Down-Captioner repository:

    # Make sure to clone with --recursive
    git clone --recursive https://github.com/peteanderson80/Up-Down-Captioner.git
    

    If you forget to clone with the --recursive flag, then you'll need to manually clone the submodules:

    git submodule update --init --recursive
    
  2. Build Caffe and pycaffe:

    cd ./external/caffe
    
    # If you're experienced with Caffe and have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe
    
  3. Build the COCO tools:

    cd ./external/coco/PythonAPI
    make
    
  4. Add python layers and caffe build to PYTHONPATH:

    cd $REPO_ROOT
    export PYTHONPATH=${PYTHONPATH}:$(pwd)/layers:$(pwd)/lib:$(pwd)/external/caffe/python
    
  5. Build Ross Girshick's Cython modules (to run the demo on new images)

    cd $REPO_ROOT/lib
    make
    
  6. Download Stanford CoreNLP (required by the evaluation code):

    cd ./external/coco-caption
    ./get_stanford_models.sh
    
  7. Download the MS COCO train/val image caption annotations. Extract all the json files into one folder $COCOdata, then create a symlink to this location:

    cd $REPO_ROOT/data
    ln -s $COCOdata coco
    
  8. Pre-process the caption annotations for training (building vocabs etc).

    cd $REPO_ROOT
    python scripts/preprocess_coco.py
    
  9. Download or generate pretrained image features following the instructions below.

Pretrained image features

LINKS HAVE BEEN UPDATED

The captioner takes pretrained image features as input (and does not finetune). For best performance, bottom-up attention features should be used. Code for generating these features can be found here. For ease-of-use, we provide pretrained features for the MSCOCO dataset. Manually download the following tsv file and unzip to data/tsv/:

To make a test server submission, you would also need these features:

Alternatively, to generate conventional pretrained features from the ResNet-101 CNN:

  • Download the pretrained ResNet-101 model and save it in baseline/ResNet-101-model.caffemodel
  • Download the MS COCO train/val images, and extract them into data/images.
  • Run:
cd $REPO_ROOT
./scripts/generate_baseline.py

Training

To train the model on the karpathy training set, and then generate and evaluate captions on the karpathy testing set (using bottom-up attention features):

cd $REPO_ROOT
./experiments/caption_lstm/train.sh

Trained snapshots are saved under: snapshots/caption_lstm/

Logging outputs are saved under: logs/caption_lstm/

Generated caption outputs are saved under: outputs/caption_lstm/

Scores for the generated captions (on the karpathy test set) are saved under: scores/caption_lstm/

To train and evaluate the baseline using conventional pretrained features, follow the instructions above but replace caption_lstm with caption_lstm_baseline_resnet.

Results

Results (using bottom-up attention features) should be similar to the numbers below (as reported in Table 1 of the paper).

BLEU-1 BLEU-4 METEOR ROUGE-L CIDEr SPICE
Cross-Entropy Loss 77.2 36.2 27.0 56.4 113.5 20.3
CIDEr Optimization 79.8 36.3 27.7 56.9 120.1 21.4

Other useful scripts

  1. scripts/create_caption_lstm.py The version of caffe provided as a submodule with this repo includes (amongst other things) a custom LSTMNode layer that enables sampling and beam search through LSTM layers. However, the resulting network architecture prototxt files are quite complicated. The file scripts/create_caption_lstm.py scaffolds out network structures, such as those in experiments.

  2. layers/efficient_rcnn_layers.py The provided net.prototxt file uses a python data layer (layers/rcnn_layers.py) that loads all training data (including image features) into memory. If you have insufficient system memory use this python data layer instead, by replacing module: "rcnn_layers" with module: "efficient_rcnn_layers" in experiments/caption_lstm/net.prototxt.

  3. scripts/plot.py Basic script for plotting validation set scores during training.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].