All Projects → zhengshou → Scnn

zhengshou / Scnn

Licence: other
Segment-CNN: A Framework for Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

Projects that are alternatives of or similar to Scnn

Image classification with 5 methods
Compared performance of KNN, SVM, BPNN, CNN, Transfer Learning (retrain on Inception v3) on image classification problem. CNN is implemented with TensorFlow
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Deeplearning Models
A collection of various deep learning architectures, models, and tips
Stars: ✭ 14,654 (+6327.19%)
Mutual labels:  jupyter-notebook
18335
18.335 - Introduction to Numerical Methods course
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook
Dataviz With Python And Js
The accompanying files for the book 'Dataviz with Python and JavaScript'
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Pytorch Handbook
pytorch handbook是一本开源的书籍,目标是帮助那些希望和使用PyTorch进行深度学习开发和研究的朋友快速入门,其中包含的Pytorch教程全部通过测试保证可以成功运行
Stars: ✭ 15,817 (+6837.28%)
Mutual labels:  jupyter-notebook
Pydata Book
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
Stars: ✭ 16,386 (+7086.84%)
Mutual labels:  jupyter-notebook
Gan Tutorial
Simple Implementation of many GAN models with PyTorch.
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Functional intro to python
[tutorial]A functional, Data Science focused introduction to Python
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook
Handson Ml2
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Stars: ✭ 18,554 (+8037.72%)
Mutual labels:  jupyter-notebook
Gpt2bot
Your new Telegram buddy powered by transformers
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook
Dat7
General Assembly's Data Science course in Washington, DC
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+1516.23%)
Mutual labels:  jupyter-notebook
Deep Learning In Production
Develop production ready deep learning code, deploy it and scale it
Stars: ✭ 216 (-5.26%)
Mutual labels:  jupyter-notebook
Full Stack Data Science
Full Stack Data Science in Python
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Satellite analysis
Analysis scripts of things related to satellites
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook
Applied Deep Learning With Keras
Deep Learning examples with Keras.
Stars: ✭ 227 (-0.44%)
Mutual labels:  jupyter-notebook
Data
Data and code behind the articles and graphics at FiveThirtyEight
Stars: ✭ 15,241 (+6584.65%)
Mutual labels:  jupyter-notebook
Kagglestruggle
Kaggle Struggle
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook
Alphatools
Quantitative finance research tools in Python
Stars: ✭ 226 (-0.88%)
Mutual labels:  jupyter-notebook
Coronavirus Epidemic Covid 19
👩🏻‍⚕️Covid-19 estimation and forecast using statistical model; 新型冠状病毒肺炎统计模型预测 (Jan 2020)
Stars: ✭ 228 (+0%)
Mutual labels:  jupyter-notebook

Segment-CNN

By Zheng Shou, Dongang Wang, and Shih-Fu Chang.

Introduction

Segment-CNN (S-CNN) is a segment-based deep learning framework for temporal action localization in untrimmed long videos.

This code has been tested on Ubuntu 14.04 with NVIDIA GTX 980 of 4GB memory for models based on C3D-v1.0 and tested with NVIDIA Titan X GPU of 12GB memory for models based on C3D-v1.1.

Current code suffices to run demo, repeat our experimental results, and train your own models. Please use "Issues" to ask questions or report bugs. Thanks. [ Mar. 2019: we stop maintaining new issues for this repository because many people have successfully reproduced our results and most common questions have been raised and addressed in the closed issues. ]

License

S-CNN is released under the MIT License (refer to the LICENSE file for details).

Citing

If you find S-CNN useful, please consider citing:

@inproceedings{scnn_shou_wang_chang_cvpr16,
  author = {Zheng Shou and Dongang Wang and Shih-Fu Chang},
  title = {Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs},
  year = {2016},
  booktitle = {CVPR} 
  }
  
@article{tran2017convnet,
  title={Convnet architecture search for spatiotemporal feature learning},
  author={Tran, Du and Ray, Jamie and Shou, Zheng and Chang, Shih-Fu and Paluri, Manohar},
  journal={arXiv preprint arXiv:1708.05038},
  year={2017}
}

We build this repo based on C3D and THUMOS Challenge 2014 . Please cite the following papers as well:

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning Spatiotemporal Features with 3D Convolutional Networks, ICCV 2015.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, Caffe: Convolutional Architecture for Fast Feature Embedding, arXiv 2014.

A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014.

@misc{THUMOS14,
  author = "Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, I. and Shah, M. and Sukthankar, R.",
  title = "{THUMOS} Challenge: Action Recognition with a Large Number of Classes",
  howpublished = "\url{http://crcv.ucf.edu/THUMOS14/}",
  Year = {2014}
  }

Installation:

  1. Download ffmpeg from https://www.ffmpeg.org/ to ./lib/preprocess/
  2. Compile 3D CNN:
    • Compile C3D_sample_rate, which is used for the proposal network and classification network
    • Compile C3D_overlap_loss, which is used for the localization network
    • Note that do not need to make unit test cases.
    • Hint: please refer to C3D-v1.0, C3D-v1.1, and Caffe for more details about compilation
  3. Download pre-trained models to ./models/ from Dropbox

Run demo:

  1. change to demo directory: cd ./demo/.
  2. run the demo using the matlab code run_demo.m or the python code run_demo.py.
  3. find the final result in the folder ./pred/final/. either in .mat format (for matlab) or .csv format (for python).
    • Note for the meaning of in seg_swin. Each row stands for one candidate segment. As for each column:
      • 1: video name in THUMOS14 test set
      • 2: sliding window length measured by number of frames
      • 3: start frame index
      • 4: end frame index
      • 5: start time
      • 6: end time
      • 9: confidence score of being the class indicated in the column 11
      • 10: confidence score of being action/non-background
      • 11: the predicted action class (from the 20 action classes [index 1-20] and the background [index 0])
      • 12: sliding window overlap. all 0.25. means using 75% overlap window.
    • Note for the meaning of res:
      • this matrix represents the confidence score on each frame per each class
      • column corresponds to each frame and row corresponds to each action class
      • the size of this matrix: the number of action classes (20 here) by the number of frames

Our pre-trained models and pre-computed results of S-CNN (based on C3D-v1.0) on THUMOS Challenge 2014 action detection task:

  1. Models:
    • ./models/conv3d_deepnetA_sport1m_iter_1900000: C3D model pre-trained on Sports1M dataset by Tran et al;
    • ./models/THUMOS14/proposal/snapshot/SCNN_uniform16_binary_iter_30000: our trained S-CNN proposal network;
    • ./models/THUMOS14/classification/snapshot/SCNN_uniform16_cls20_iter_30000: our trained S-CNN classification network;
    • ./models/THUMOS14/localization/snapshot/SCNN_uniform16_cls20_with_overlap_loss_iter_30000: our trained S-CNN localization network.
  2. Results:
    • ./experiments/THUMOS14/network_proposal/result/res_seg_swin.mat: contains the output results of the proposal network. we keep segment whose confidence score of being action >= 0.7 as the candidate segment to further input into the following localization network;
    • ./experiments/THUMOS14/network_localization/result/res_seg_swin.mat: contains the output results of the localization network;
    • evaluate mAP: run ./experiments/THUMOS14/eval/eval_scnn_thumos14.m and results are stored in ./experiments/THUMOS14/eval/res_scnn_thumos14.mat. we vary the overlap threshold IoU used in evaluation from 0.1 to 0.5

Our pre-trained models and pre-computed results of S-CNN (based on C3D-v1.1) on THUMOS Challenge 2014 action detection task:

  1. Models:
    • ./models/c3d_resnet18_sports1m_r2_iter_2800000.caffemodel: C3D model pre-trained on Sports1M dataset by Tran et al;
    • ./models/THUMOS14/proposal/snapshot/c3d_resnet18_sports1m_r2_iter_27384.caffemodel: our trained S-CNN proposal network;
    • ./models/THUMOS14/classification/snapshot/c3d_resnet18_sports1m_r2_iter_14704.caffemodel: our trained S-CNN classification network;
    • ./models/THUMOS14/localization/snapshot/c3d_resnet18_sports1m_r2_iter_14704.caffemodel: our trained S-CNN localization network.
  2. Results:
    • ./experiments/THUMOS14_Res3D/network_proposal/result/res_seg_swin.mat: contains the output results of the proposal network. we keep segment whose confidence score of being action >= 0.7 as the candidate segment to further input into the following localization network;
    • ./experiments/THUMOS14_Res3D/network_localization/result/res_seg_swin.mat: contains the output results of the localization network;
    • evaluate mAP: run ./experiments/THUMOS14_Res3D/eval/eval_scnn_thumos14.m and results are stored in ./experiments/THUMOS14/eval/res_scnn_thumos14.mat. we vary the overlap threshold IoU used in evaluation from 0.3 to 0.7

Train your own S-CNN model (based on C3D-v1.0):

  1. We provide the parameter settings and the network architecture definition inside ./experiments/THUMOS14/network_proposal/, ./experiments/THUMOS14/network_classification/, ./experiments/THUMOS14/network_localization/ respectively.
  2. We also provide sample input data file to illustrate input data file list format, which is slightly different from C3D:
    • still, each row corresponds to one input segment
    • C3D_sample_rate (used for proposal and classification network):
      • format: video_frame_directory start_frame_index class_label stepsize
      • stepsize: used for adjusting the window length. measure the step between two consecutive frames in one segment. the frame index of the current frame + stepsize = the frame index of the subsequent frame. note that each segment consists of 16 frames in total.
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8
    • C3D_overlap_loss (used for localization network):
      • format: video_frame_directory start_frame_index class_label stepsize overlap
      • overlap: the overlap measured by IoU between the candidate segment and the corresponding ground truth segment
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8 0.70701
  3. NOTE: please refer to C3D-v1.0 and Caffe for more general instructions about how to train 3D CNN model.

Train your own S-CNN model (based on C3D-v1.1):

  1. We provide the parameter settings and the network architecture definition inside ./experiments/THUMOS14_Res3D/network_proposal/, ./experiments/THUMOS14_Res3D/network_classification/, ./experiments/THUMOS14_Res3D/network_localization/ respectively.
  2. We also provide sample input data file to illustrate input data file list format, which is slightly different from C3D:
    • still, each row corresponds to one input segment
    • C3D_sample_rate (used for proposal and classification network):
      • format: video_frame_directory start_frame_index class_label stepsize
      • stepsize: used for adjusting the window length. measure the step between two consecutive frames in one segment. the frame index of the current frame + stepsize = the frame index of the subsequent frame. note that each segment consists of 16 frames in total.
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8
    • C3D_overlap_loss (used for localization network):
      • format: video_frame_directory start_frame_index class_label stepsize overlap
      • overlap: the overlap measured by IoU between the candidate segment and the corresponding ground truth segment
      • example: /dataset/THUMOS14/val/validation_frm_all/video_validation_0000051/ 2561 3 8 0.70701
  3. NOTE: please refer to C3D-v1.1 and Caffe for more general instructions about how to train 3D CNN model. Res3D uses 8 frames for each clip to produce one label. Because S-CNN samples 16 frames out of multi-scale temporal window which can be up to 512 frames long, we still keep 16 frames for each clip in S-CNN.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].