Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → salesforce → Densecap

salesforce / Densecap

Licence: bsd-3-clause

Labels

jupyter-notebook

Projects that are alternatives of or similar to Densecap

This is a companion to the ‘Mathematical Foundations’ section of the book, Mathematics for Machine Learning by Marc Deisenroth, Aldo Faisal and Cheng Ong, written in python for Jupyter Notebook.

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

A Python implementation of Seasonal and Trend decomposition using Loess (STL) for time series data.

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Python port of CausalImpact R library

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Object detection demo

How to train an object detection model easy for free

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

An easy to use waterfall chart function for Python

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Sometimes deep sometimes learning

A collection of DL experiments and notes

Stars: ✭ 129 (+1.57%)

Mutual labels: jupyter-notebook

Stars: ✭ 129 (+1.57%)

Mutual labels: jupyter-notebook

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook, published by Packt

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Cloud Dataproc: Samples and Utils

Stars: ✭ 128 (+0.79%)

Mutual labels: jupyter-notebook

Regularized Linear Autoencoders

Loss Landscapes of Regularized Linear Autoencoders

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Download Celeba Hq

Python script to download the celebA-HQ dataset from google drive

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

This repository contains Deep Learning examples using Tensorflow. This repository will be useful for Deep Learning starters who find difficulty in understanding the example codes.

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Official reinforcement learning environment for demand response and load shaping

Stars: ✭ 129 (+1.57%)

Mutual labels: jupyter-notebook

using CNN to do move prediction and board evaluation for the board game Go

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Machine Learning Tutorial in IPython Notebooks

Stars: ✭ 129 (+1.57%)

Mutual labels: jupyter-notebook

DEPRECATED - DO NOT USE

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

Kaggle earthquake challenge

This is the code for the Kaggle Earthquake Challenge by Siraj Raval on Youtube

Stars: ✭ 132 (+3.94%)

Mutual labels: jupyter-notebook

Data Structures Algorithms Python

This tutorial playlist covers data structures and algorithms in python. Every tutorial has theory behind data structure or an algorithm, BIG O Complexity analysis and exercises that you can practice on.

Stars: ✭ 126 (-0.79%)

Mutual labels: jupyter-notebook

Cn Machine Learning

https://cn.udacity.com/mlnd/

Stars: ✭ 130 (+2.36%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

End-to-End Dense Video Captioning with Masked Transformer

This is the source code for our paper End-to-End Dense Video Captioning with Masked Transformer. It mainly supports dense video captioning on generated segments. To generate captions on GT segments, please refer to our new GVD repo and also our notes.

Requirements (Recommended)

Miniconda3 for Python 3.6
CUDA 9.2 and CUDNN v7.1
PyTorch 0.4.0. Follow the instructions to install pytorch and torchvision.
Install other required modules (e.g., torchtext)

pip install -r requirements.txt

Optional: If you would like to use visdom to track training do pip install visdom

Optional: If you would like to use spacy tokenizer do pip install spacy

Note: The code has been tested on a variety of GPUs, including 1080 Ti, Titan Xp, P100, V100 etc. However, for the latest RTX GPUs (e.g., 2080 Ti), CUDA 10.0 and hence PyTorch 1.0 are required. The code needs to be upgraded to PyTorch 1.0.

Data Preparation

Annotation and feature

For ActivityNet, download the re-formatted annotation files from here, decompress and place under directory data. The frame-wise appearance (with suffix _resnet.npy) and motion (with suffix _bn.npy) feature files for each spilt are available [train (27.7GB), val (13.7GB), test (13.6GB)] and should be decompressed and placed under your dataset directory (refer to as feature_root in the configuration files).

Similarly for YouCook2, the annotation files are available here and should be placed under data. The feature files are [train (9.6GB), val (3.2GB), test (1.5GB)].

You could also extract the feature on your own with this code. Note that ActivityNet is processed with an older version of the repo while YouCook2 is processed with the latest code which had a minor change regarding the sampling approach. This accounts for the difference in the formulation of frame_to_second conversion.

Evaluate scripts

Download the dense video captioning evaluation scripts and place it under the tools directory. Make sure you recursively clone the repo. Our code is equavalent to the official evaluation code from ActivityNet 2017 Challenge, but faster. Note that the current evaluation scripts had a few major bugs fixed towards ActivityNet 2018 Challenge.

The evaluate script for event proposal can be found under tools.

Training and Validation

First, set the paths in configuration files (under cfgs) to your own data and feature directories. Create new directories log and results under the root directory to save log and result files.

The example command on running a 4-GPU distributed data parallel job (for ActivityNet):

For Masked Transformer:

CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight | tee log/$id-3

For End-to-End Masked Transformer:

CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url $dist_url --cfgs_file $cfgs_file \
    --checkpoint_path ./checkpoint/$id --batch_size $batch_size --world_size 4 \
    --cuda --sent_weight $sent_weight --mask_weight $mask_weight --gated_mask | tee log/$id-3

Arguments: batch_size=14, mask_weight=1.0, sent_weight=0.25, cfgs_file='cfgs/anet.yml', dist_url='file:///home/luozhou/nonexistent_file' (replace with your directory), id indicates the model name.

For YouCook2 dataset, you can simply replace cfgs/anet.yml with cfgs/yc2.yml. To monitor the training (e.g., training & validation losses), start the visdom server with visdom in the background (e.g., tmux). Then, add --enable_visdom as a command argument.

Note that at least 15 GB of free RAM is required for the training. The nonexistent_file will normally be cleaned up automatically, but might need a manual delete if otherwise. More about distributed data parallel see here (0.4.0). You can also run the code with a single GPU by setting world_size=1.

Due to legacy reasons, we store the feature files as individual .npy files, which causes latency in data loading and hence instability during distributed model training. By default, we set the value of num_workers to 1. It could be set up to 6 for a faster data loading. However, if encouter any data parallel issue, try setting it to 0.

Pre-trained Models

The pre-trained models can be downloaded from here (1GB). Make sure you uncompress the file under the checkpoint directory (create one under the root directory if not exists).

Testing

For Masked Transformer (id=anet-2L-gt-mask):

python3 scripts/test.py --cfgs_file $cfgs_file --densecap_eval_file ./tools/densevid_eval/evaluate.py \
    --batch_size 1 --start_from ./checkpoint/$id/model_epoch_$epoch.t7 --id $id-$epoch \
    --val_data_folder $split --cuda | tee log/eval-$id-epoch$epoch

For End-to-End Masked Transformer (id=anet-2L-e2e-mask):

python3 scripts/test.py --cfgs_file $cfgs_file --densecap_eval_file ./tools/densevid_eval/evaluate.py \
    --batch_size 1 --start_from ./checkpoint/$id/model_epoch_$epoch.t7 --id $id-$epoch \
    --val_data_folder $split --learn_mask --gated_mask --cuda | tee log/eval-$id-epoch$epoch

Arguments: epoch=19, split='validation', cfgs_file='cfgs/anet.yml'

This gives you the language evaluation results on the validation set. You need at least 8GB of free GPU memory for the evaluation. The current evaluation script only supports batch_size=1 and is slow (1hr for yc2 and 4hr for anet). We actively welcome pull requests.

Leaderboard (for the test set)

The official evaluation servers are available under ActivityNet and YouCook2. Note that the NEW evaluation scripts from ActivityNet 2018 Challenge are used in both cases.

Notes

We use a different code base for captioning-only models (dense captioning on GT segments). Please contact [email protected] for details. Note that it can potentially work with this code base if you feed in GT segments into the captioning module rather than the generated segments. However, there is no guarantee on reproducing the results from the paper. You can also refer to this implementation where you need to config --att_model to 'transformer'.

Citation

@inproceedings{zhou2018end,
  title={End-to-End Dense Video Captioning with Masked Transformer},
  author={Zhou, Luowei and Zhou, Yingbo and Corso, Jason J and Socher, Richard and Xiong, Caiming},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={8739--8748},
  year={2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 127

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗