All Projects → jcjohnson → Densecap

jcjohnson / Densecap

Licence: mit
Dense image captioning in Torch

Programming Languages

Jupyter Notebook
11667 projects
lua
6591 projects
python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects
CSS
56736 projects

Projects that are alternatives of or similar to Densecap

Math For Programmers
Source code for the book, Math for Programmers
Stars: ✭ 107 (-92.72%)
Mutual labels:  jupyter-notebook
Woe And Iv
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Ml Da Coursera Yandex Mipt
Machine Learning and Data Analysis Coursera Specialization from Yandex and MIPT
Stars: ✭ 108 (-92.65%)
Mutual labels:  jupyter-notebook
Shot Type Classifier
Detecting cinema shot types using a ResNet-50
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Isl Python
Solutions to labs and excercises from An Introduction to Statistical Learning, as Jupyter Notebooks.
Stars: ✭ 108 (-92.65%)
Mutual labels:  jupyter-notebook
Hass Data Detective
Explore and analyse your Home Assistant data
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities.
Stars: ✭ 1,486 (+1.16%)
Mutual labels:  jupyter-notebook
Credit score
data from the kaggle 'give me some credit" competition
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Simple mf
Simple but Flexible Recommendation Engine in PyTorch
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Nb pdf template
A more accurate representation of jupyter notebooks when converting to pdfs.
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Clx
A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Stars: ✭ 108 (-92.65%)
Mutual labels:  jupyter-notebook
Solvingalmostanythingwithbert
BioBert Pytorch
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Street to shop experiments
Stars: ✭ 108 (-92.65%)
Mutual labels:  jupyter-notebook
Fin Ml
This github repository contains the code to the case studies in the O'Reilly book Machine Learning and Data Science Blueprints for Finance
Stars: ✭ 107 (-92.72%)
Mutual labels:  jupyter-notebook
Alexnet Experiments Keras
Code examples for training AlexNet using Keras and Theano
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Dtreeviz
A python library for decision tree visualization and model interpretation.
Stars: ✭ 1,857 (+26.41%)
Mutual labels:  jupyter-notebook
Histbook
Versatile, high-performance histogram toolkit for Numpy.
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Prisma abu
用机器学习做个艺术画家-Prisma
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook
Deeplearning.ai Convolutional Neural Networks
Completed assignment jupyter notebook of Foundations of Convolutional Neural Networks, deeplearning.ai coursera course
Stars: ✭ 109 (-92.58%)
Mutual labels:  jupyter-notebook

DenseCap

This is the code for the paper

DenseCap: Fully Convolutional Localization Networks for Dense Captioning,
Justin Johnson*, Andrej Karpathy*, Li Fei-Fei,
(* equal contribution)
Presented at CVPR 2016 (oral)

The paper addresses the problem of dense captioning, where a computer detects objects in images and describes them in natural language. Here are a few example outputs:

The model is a deep convolutional neural network trained in an end-to-end fashion on the Visual Genome dataset.

We provide:

If you find this code useful in your research, please cite:

@inproceedings{densecap,
  title={DenseCap: Fully Convolutional Localization Networks for Dense Captioning},
  author={Johnson, Justin and Karpathy, Andrej and Fei-Fei, Li},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and 
             Pattern Recognition},
  year={2016}
}

Installation

DenseCap is implemented in Torch, and depends on the following packages: torch/torch7, torch/nn, torch/nngraph, torch/image, lua-cjson, qassemoquab/stnbhwd, jcjohnson/torch-rnn

After installing torch, you can install / update these dependencies by running the following:

luarocks install torch
luarocks install nn
luarocks install image
luarocks install lua-cjson
luarocks install https://raw.githubusercontent.com/qassemoquab/stnbhwd/master/stnbhwd-scm-1.rockspec
luarocks install https://raw.githubusercontent.com/jcjohnson/torch-rnn/master/torch-rnn-scm-1.rockspec

(Optional) GPU acceleration

If have an NVIDIA GPU and want to accelerate the model with CUDA, you'll also need to install torch/cutorch and torch/cunn; you can install / update these by running:

luarocks install cutorch
luarocks install cunn
luarocks install cudnn

(Optional) cuDNN

If you want to use NVIDIA's cuDNN library, you'll need to register for the CUDA Developer Program (it's free) and download the library from NVIDIA's website; you'll also need to install the cuDNN bindings for Torch by running

luarocks install cudnn

Pretrained model

You can download a pretrained DenseCap model by running the following script:

 sh scripts/download_pretrained_model.sh

This will download a zipped version of the model (about 1.1 GB) to data/models/densecap/densecap-pretrained-vgg16.t7.zip, unpack it to data/models/densecap/densecap-pretrained-vgg16.t7 (about 1.2 GB) and then delete the zipped version.

This is not the exact model that was used in the paper, but is has comparable performance; using 1000 region proposals per image, it achieves a mAP of 5.70 on the test set which is slightly better than the mAP of 5.39 that we report in the paper.

Running on new images

To run the model on new images, use the script run_model.lua. To run the pretrained model on the provided elephant.jpg image, use the following command:

th run_model.lua -input_image imgs/elephant.jpg

By default this will run in GPU mode; to run in CPU only mode, simply add the flag -gpu -1.

This command will write results into the folder vis/data. We have provided a web-based visualizer to view these results; to use it, change to the vis directory and start a local HTTP server:

cd vis
python -m SimpleHTTPServer 8181

Then point your web browser to http://localhost:8181/view_results.html.

If you have an entire directory of images on which you want to run the model, use the -input_dir flag instead:

th run_model.lua -input_dir /path/to/my/image/folder

This run the model on all files in the folder /path/to/my/image/folder/ whose filename does not start with ..

The web-based visualizer is the prefered way to view results, but if you don't want to use it then you can instead render an image with the detection boxes and captions "baked in"; add the flag -output_dir to specify a directory where output images should be written:

th run_model.lua -input_dir /path/to/my/image/folder -output_dir /path/to/output/folder/

The run_model.lua script has several other flags; you can find details here.

Training

To train a new DenseCap model, you will following the following steps:

  1. Download the raw images and region descriptions from the Visual Genome website
  2. Use the script preprocess.py to generate a single HDF5 file containing the entire dataset (details here)
  3. Use the script train.lua to train the model (details here)
  4. Use the script evaluate_model.lua to evaluate a trained model on the validation or test data (details here)

For more instructions on training see INSTALL.md in doc folder.

Evaluation

In the paper we propose a metric for automatically evaluating dense captioning results. Our metric depends on METEOR, and our evaluation code requires both Java and Python 2.7. The following script will download and unpack the METEOR jarfile:

sh scripts/setup_eval.sh

The evaluation code is not required to simply run a trained model on images; you can find more details about the evaluation code here.

Webcam demos

If you have a powerful GPU, then the DenseCap model is fast enough to run in real-time. We provide two demos to allow you to run DenseCap on frames from a webcam.

Single-machine demo

If you have a single machine with both a webcam and a powerful GPU, then you can use this demo to run DenseCap in real time at up to 10 frames per second. This demo depends on a few extra Lua packages:

You can install / update these dependencies by running the following:

luarocks install camera
luarocks install qtlua

You can start the demo by running the following:

qlua webcam/single_machine_demo.lua

Client / server demo

If you have a machine with a powerful GPU and another machine with a webcam, then this demo allows you use the GPU machine as a server and the webcam machine as a client; frames will be streamed from the client to to the server, the model will run on the server, and predictions will be shipped back to the client for viewing. This allows you to run DenseCap on a laptop, but with network and filesystem overhead you will typically only achieve 1 to 2 frames per second.

The server is written in Flask; on the server machine run the following to install dependencies:

cd webcam
virtualenv .env
source .env/bin/activate
pip install -r requirements.txt
cd ..

For technical reasons, the server needs to serve content over SSL; it expects to find SSL key files and certificate files in webcam/ssl/server.key and webcam/ssl/server.crt respectively. You can generate a self-signed SSL certificate by running the following:

mkdir webcam/ssl

# Step 1: Generate a private key
openssl genrsa -des3 -out webcam/ssl/server.key 1024
# Enter a password

# Step 2: Generate a certificate signing request
openssl req -new -key webcam/ssl/server.key -out webcam/ssl/server.csr
# Enter the password from above and leave all other fields blank

# Step 3: Strip the password from the keyfile
cp webcam/ssl/server.key webcam/ssl/server.key.org
openssl rsa -in webcam/ssl/server.key.org -out webcam/ssl/server.key

# Step 4: Generate self-signed certificate
openssl x509 -req -days 365 -in webcam/ssl/server.csr -signkey webcam/ssl/server.key -out webcam/ssl/server.crt
# Enter the password from above

You can now run the following two commands to start the server; both will run forever:

th webcam/daemon.lua
python webcam/server.py

On the client, point a web browser at the following page:

https://cs.stanford.edu/people/jcjohns/densecap/demo/web-client.html?server_url=SERVER_URL

but you should replace SERVER_URL with the actual URL of the server.

Note: If the server is using a self-signed SSL certificate, you may need to manually tell your browser that the certificate is safe by pointing your client's web browser directly at the server URL; you will get a message that the site is unsafe; for example on Chrome you will see the following:

Afterward you should see a message telling you that the DenseCap server is running, and the web client should work after refreshing.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].