All Projects → salesforce → Densecap

salesforce / Densecap

Licence: bsd-3-clause

Projects that are alternatives of or similar to Densecap

Mml Companion
This is a companion to the ‘Mathematical Foundations’ section of the book, Mathematics for Machine Learning by Marc Deisenroth, Aldo Faisal and Cheng Ong, written in python for Jupyter Notebook.
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Stldecompose
A Python implementation of Seasonal and Trend decomposition using Loess (STL) for time series data.
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Causalimpact
Python port of CausalImpact R library
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Object detection demo
How to train an object detection model easy for free
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Waterfall
An easy to use waterfall chart function for Python
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Sometimes deep sometimes learning
A collection of DL experiments and notes
Stars: ✭ 129 (+1.57%)
Mutual labels:  jupyter-notebook
Hands on julia
Stars: ✭ 129 (+1.57%)
Mutual labels:  jupyter-notebook
Python Feature Engineering Cookbook
Python Feature Engineering Cookbook, published by Packt
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Inferpy
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Cloud Dataproc
Cloud Dataproc: Samples and Utils
Stars: ✭ 128 (+0.79%)
Mutual labels:  jupyter-notebook
Regularized Linear Autoencoders
Loss Landscapes of Regularized Linear Autoencoders
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Download Celeba Hq
Python script to download the celebA-HQ dataset from google drive
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Deep Learning
This repository contains Deep Learning examples using Tensorflow. This repository will be useful for Deep Learning starters who find difficulty in understanding the example codes.
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Citylearn
Official reinforcement learning environment for demand response and load shaping
Stars: ✭ 129 (+1.57%)
Mutual labels:  jupyter-notebook
Gocnn
using CNN to do move prediction and board evaluation for the board game Go
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Mltutorial
Machine Learning Tutorial in IPython Notebooks
Stars: ✭ 129 (+1.57%)
Mutual labels:  jupyter-notebook
Tutorials
DEPRECATED - DO NOT USE
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook
Kaggle earthquake challenge
This is the code for the Kaggle Earthquake Challenge by Siraj Raval on Youtube
Stars: ✭ 132 (+3.94%)
Mutual labels:  jupyter-notebook
Data Structures Algorithms Python
This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.
Stars: ✭ 126 (-0.79%)
Mutual labels:  jupyter-notebook
Cn Machine Learning
https://cn.udacity.com/mlnd/
Stars: ✭ 130 (+2.36%)
Mutual labels:  jupyter-notebook

End-to-End Dense Video Captioning with Masked Transformer

This is the source code for our paper End-to-End Dense Video Captioning with Masked Transformer. It mainly supports dense video captioning on generated segments. To generate captions on GT segments, please refer to our new GVD repo and also our notes.

Requirements (Recommended)

  1. Miniconda3 for Python 3.6

  2. CUDA 9.2 and CUDNN v7.1

  3. PyTorch 0.4.0. Follow the instructions to install pytorch and torchvision.

  4. Install other required modules (e.g., torchtext)

pip install -r requirements.txt

Optional: If you would like to use visdom to track training do pip install visdom

Optional: If you would like to use spacy tokenizer do pip install spacy

Note: The code has been tested on a variety of GPUs, including 1080 Ti, Titan Xp, P100, V100 etc. However, for the latest RTX GPUs (e.g., 2080 Ti), CUDA 10.0 and hence PyTorch 1.0 are required. The code needs to be upgraded to PyTorch 1.0.

Data Preparation

Annotation and feature

For ActivityNet, download the re-formatted annotation files from here, decompress and place under directory data. The frame-wise appearance (with suffix _resnet.npy) and motion (with suffix _bn.npy) feature files for each spilt are available [train (27.7GB), val (13.7GB), test (13.6GB)] and should be decompressed and placed under your dataset directory (refer to as feature_root in the configuration files).

Similarly for YouCook2, the annotation files are available here and should be placed under data. The feature files are [train (9.6GB), val (3.2GB), test (1.5GB)].

You could also extract the feature on your own with this code. Note that ActivityNet is processed with an older version of the repo while YouCook2 is processed with the latest code which had a minor change regarding the sampling approach. This accounts for the difference in the formulation of frame_to_second conversion.

Evaluate scripts

Download the dense video captioning evaluation scripts and place it under the tools directory. Make sure you recursively clone the repo. Our code is equavalent to the official evaluation code from ActivityNet 2017 Challenge, but faster. Note that the current evaluation scripts had a few major bugs fixed towards ActivityNet 2018 Challenge.

The evaluate script for event proposal can be found under tools.

Training and Validation

First, set the paths in configuration files (under cfgs) to your own data and feature directories. Create new directories log and results under the root directory to save log and result files.

The example command on running a 4-GPU distributed data parallel job (for ActivityNet):

For Masked Transformer:

CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-3

For End-to-End Masked Transformer:

CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-3

Arguments: batch_size=14, mask_weight=1.0, sent_weight=0.25, cfgs_file='cfgs/anet.yml', dist_url='file:///home/luozhou/nonexistent_file' (replace with your directory), id indicates the model name.

For YouCook2 dataset, you can simply replace cfgs/anet.yml with cfgs/yc2.yml. To monitor the training (e.g., training & validation losses), start the visdom server with visdom in the background (e.g., tmux). Then, add --enable_visdom as a command argument.

Note that at least 15 GB of free RAM is required for the training. The nonexistent_file will normally be cleaned up automatically, but might need a manual delete if otherwise. More about distributed data parallel see here (0.4.0). You can also run the code with a single GPU by setting world_size=1.

Due to legacy reasons, we store the feature files as individual .npy files, which causes latency in data loading and hence instability during distributed model training. By default, we set the value of num_workers to 1. It could be set up to 6 for a faster data loading. However, if encouter any data parallel issue, try setting it to 0.

Pre-trained Models

The pre-trained models can be downloaded from here (1GB). Make sure you uncompress the file under the checkpoint directory (create one under the root directory if not exists).

Testing

For Masked Transformer (id=anet-2L-gt-mask):

python3 scripts/test.py --cfgs_file $cfgs_file --densecap_eval_file ./tools/densevid_eval/evaluate.py \
    --batch_size 1 --start_from ./checkpoint/$id/model_epoch_$epoch.t7 --id $id-$epoch \
    --val_data_folder $split --cuda | tee log/eval-$id-epoch$epoch

For End-to-End Masked Transformer (id=anet-2L-e2e-mask):

python3 scripts/test.py --cfgs_file $cfgs_file --densecap_eval_file ./tools/densevid_eval/evaluate.py \
    --batch_size 1 --start_from ./checkpoint/$id/model_epoch_$epoch.t7 --id $id-$epoch \
    --val_data_folder $split --learn_mask --gated_mask --cuda | tee log/eval-$id-epoch$epoch

Arguments: epoch=19, split='validation', cfgs_file='cfgs/anet.yml'

This gives you the language evaluation results on the validation set. You need at least 8GB of free GPU memory for the evaluation. The current evaluation script only supports batch_size=1 and is slow (1hr for yc2 and 4hr for anet). We actively welcome pull requests.

Leaderboard (for the test set)

The official evaluation servers are available under ActivityNet and YouCook2. Note that the NEW evaluation scripts from ActivityNet 2018 Challenge are used in both cases.

Notes

We use a different code base for captioning-only models (dense captioning on GT segments). Please contact [email protected] for details. Note that it can potentially work with this code base if you feed in GT segments into the captioning module rather than the generated segments. However, there is no guarantee on reproducing the results from the paper. You can also refer to this implementation where you need to config --att_model to 'transformer'.

Citation

@inproceedings{zhou2018end,
  title={End-to-End Dense Video Captioning with Masked Transformer},
  author={Zhou, Luowei and Zhou, Yingbo and Corso, Jason J and Socher, Richard and Xiong, Caiming},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={8739--8748},
  year={2018}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].