All Projects → imatge-upc → Activitynet 2016 Cvprw

imatge-upc / Activitynet 2016 Cvprw

Licence: mit
Tools to participate in the ActivityNet Challenge 2016 (NIPSW 2016)

Projects that are alternatives of or similar to Activitynet 2016 Cvprw

Seldon Core
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Stars: ✭ 2,815 (+1373.82%)
Mutual labels:  jupyter-notebook
Personal
Contains Jupyter Notebooks of stuff I am working on.
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Feature Engineering
Stars: ✭ 191 (+0%)
Mutual labels:  jupyter-notebook
Tianchi Diabetes Top12
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Adversarialvariationalbayes
This repository contains the code to reproduce the core results from the paper "Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks".
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Deep Learning Paper Review And Practice
꼼꼼한 딥러닝 논문 리뷰와 코드 실습
Stars: ✭ 184 (-3.66%)
Mutual labels:  jupyter-notebook
Deep Learning With Tensorflow 2 And Keras
Deep Learning with TensorFlow 2 and Keras, published by Packt
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Pydata Cookbook
PyData Cookbook Project
Stars: ✭ 191 (+0%)
Mutual labels:  jupyter-notebook
Bet On Sibyl
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Statistical Learning Method Camp
统计学习方法训练营课程作业及答案,视频笔记在线阅读地址:https://relph1119.github.io/statistical-learning-method-camp
Stars: ✭ 191 (+0%)
Mutual labels:  jupyter-notebook
Pqkmeans
Fast and memory-efficient clustering
Stars: ✭ 189 (-1.05%)
Mutual labels:  jupyter-notebook
Dl4mir
Deep learning for MIR
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Deep Learning Notes
My personal notes, presentations, and notebooks on everything Deep Learning.
Stars: ✭ 191 (+0%)
Mutual labels:  jupyter-notebook
Thinkdsp
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.
Stars: ✭ 2,485 (+1201.05%)
Mutual labels:  jupyter-notebook
Teachopencadd
TeachOpenCADD: a teaching platform for computer-aided drug design (CADD) using open source packages and data
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Beginners Pytorch Deep Learning
Repository for scripts and notebooks from the book: Programming PyTorch for Deep Learning
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Cnn Re Tf
Convolutional Neural Network for Multi-label Multi-instance Relation Extraction in Tensorflow
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook
Vanillacnn
Implementation of the Vanilla CNN described in the paper: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project. http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/ Written by Ishay Tubi : ishay2b [at] gmail [dot] com https://www.l
Stars: ✭ 191 (+0%)
Mutual labels:  jupyter-notebook
Magic
MAGIC (Markov Affinity-based Graph Imputation of Cells), is a method for imputing missing values restoring structure of large biological datasets.
Stars: ✭ 189 (-1.05%)
Mutual labels:  jupyter-notebook
Self driving car specialization
Assignments and notes for the Self Driving Cars course offered by University of Toronto on Coursera
Stars: ✭ 190 (-0.52%)
Mutual labels:  jupyter-notebook

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

This project page describes our paper at the 1st NIPS Workshop on Large Scale Computer Vision Systems. This work also corresponds to the submission of the UPC team participating in the ActivityNet Challenge for CVPR 2016.

Alberto Montes Amaia Salvador Xavier Giró-i-Nieto Santiago Pascual
Main contributor Advisor Advisor Co-advisor
Alberto Montes Amaia Salvador Xavier Giró-i-Nieto Santiago Pascual

Institution: Universitat Politècnica de Catalunya.

Universitat Politècnica de Catalunya

Abstract

This work proposes a simple pipeline to classify and temporally localize activities in untrimmed videos. Our system uses features from a 3D Convolutional Neural Network (C3D) as input to train a a recurrent neural network (RNN) that learns to classify video clips of 16 frames. After clip prediction, we post-process the output of the RNN to assign a single activity label to each video, and determine the temporal boundaries of the activity within the video. We show how our system can achieve competitive results in both tasks with a simple architecture. We evaluate our method in the ActivityNet Challenge 2016, achieving a 0.5874 mAP and a 0.2237 mAP in the classification and detection tasks, respectively.

What Are You Going to Find Here

This project is a baseline in the activity classification and its temporal location, focused on the ActivityNet Challenge. Here is detailed all the process of our proposed pipeline, as well the trained models and the utility to classify and temporally localize activities on new videos given. All the steps have been detailed, from downloading the dataset, to predicting the temporal locations going through the feature extraction and also the training.

Publication

Download our paper at the 1st NIPS Workshop on Large Scale Computer Vision Systems by clicking here. Please cite with the following Bibtex code:

@InProceedings{Montes_2016_NIPSWS,
author = {Montes, Alberto and Salvador, Amaia and Pascual, Santiago and Giro-i-Nieto, Xavier},
title = {Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks},
booktitle = {1st NIPS Workshop on Large Scale Computer Vision Systems},
month = {December},
year = {2016}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Alberto Montes, Amaia Salvador, Santiago Pascual, and Xavier Giro-i-Nieto. "Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks." In 1st NIPS Workshop on Large Scale Computer Vision Systems. 2016.

This work is the result of the bachelor thesis by Alberto Montes at UPC TelecomBCN ETSETB during Spring 2016. Please check his technical report, slides and oral presentation for more details.

Repository Structure

This repository is structured in the following way:

  • data/: dir where, by default, all the data such as videos or model weights are stored. Some data is given such ass the C3D means and also provide scripts to download the weights for the C3D model and the one we propose.
  • dataset/: files describing the ActivityNet dataset and a script to download all the videos. The information of the dataset has been extended with the number of frames at each of the videos.
  • misc/: directory with some miscellaneous information such as all the details of the steps followed on this project and much more.
  • notebooks/: notebooks with some visualization of the results.
  • scripts/: scripts to reproduce all the steps of project.
  • src/: source code required for the scripts.

Dependencies

This project is build using the Keras library for Deep Learning, which can use as a backend both Theano and TensorFlow.

We have used Theano in order to develop the project because it supported 3D convolutions and pooling required to run the C3D network.

For a further and more complete of all the dependencies used within this project, check out the requirements.txt provided within the project. This file will help you to recreate the exact same Python environment that we worked with.

Pipeline

The pipeline proposed to face the ActivityNet Challenge is made up by two stages.

The first stage encode the video information into a single vector representation for small video clips. To achieve that, the C3D network [Tran2014] is used. The C3D network uses 3D convolutions to extract spatiotemporal features from the videos, which previously have been split in 16-frames clips.

The second stage, once the video features are extracted, is to classify the activity on each clip as the videos of the ActivityNet are untrimmed and may be an activity or not (background). To perform this classification a RNN is used. More specifically a LSTM network which tries to exploit long term correlations and perform a prediction of the video sequence. This stage is the one which has been trained.

The structure of the network can be seen on the next figure.

Network Pipeline

To reproduce all the process of the pipeline, there is a detailed guide about how to reproduce all the steps with the scripts provided.

Related work

  • Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015, December). Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 4489-4497). IEEE. [paper] [code]
  • Sharma, S., Kiros, R., & Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv preprint arXiv:1511.04119. [paper][code]
  • Yeung, S., Russakovsky, O., Mori, G., & Fei-Fei, L. (2015). End-to-end Learning of Action Detection from Frame Glimpses in Videos. arXiv preprint arXiv:1511.06984. [paper]
  • Yeung, Serena, et al. "Every moment counts: Dense detailed labeling of actions in complex videos." arXiv preprint arXiv:1507.05738 (2015).[paper]
  • Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2011, November). Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding (pp. 29-39). Springer Berlin Heidelberg. [paper]
  • Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." [paper] [code]

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at the UPC.

Albert Gil Josep Pujal
Albert Gil Josep Pujal

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the issues section on this github repo. Alternatively, drop us an e-mail at [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].