All Projects → kenshohara → 3d Resnets

kenshohara / 3d Resnets

Licence: mit
3D ResNets for Action Recognition

Programming Languages

lua
6591 projects

Projects that are alternatives of or similar to 3d Resnets

Tsn Pytorch
Temporal Segment Networks (TSN) in PyTorch
Stars: ✭ 895 (+842.11%)
Mutual labels:  action-recognition
Fight detection
Real time Fight Detection Based on 2D Pose Estimation and RNN Action Recognition
Stars: ✭ 65 (-31.58%)
Mutual labels:  action-recognition
Vidvrd Helper
To keep updates with VRU Grand Challenge, please use https://github.com/NExTplusplus/VidVRD-helper
Stars: ✭ 81 (-14.74%)
Mutual labels:  action-recognition
Action Recognition Using 3d Resnet
Use 3D ResNet to extract features of UCF101 and HMDB51 and then classify them.
Stars: ✭ 32 (-66.32%)
Mutual labels:  action-recognition
Training toolbox caffe
Training Toolbox for Caffe
Stars: ✭ 51 (-46.32%)
Mutual labels:  action-recognition
Hake Action
As a part of the HAKE project, includes the reproduced SOTA models and the corresponding HAKE-enhanced versions (CVPR2020).
Stars: ✭ 72 (-24.21%)
Mutual labels:  action-recognition
Action recognition tf
Action recognition.基于C3D的视频动作识别
Stars: ✭ 16 (-83.16%)
Mutual labels:  action-recognition
Temporal Segment Networks
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
Stars: ✭ 1,287 (+1254.74%)
Mutual labels:  action-recognition
Torch Models
Stars: ✭ 65 (-31.58%)
Mutual labels:  torch7
Hake Action Torch
HAKE-Action in PyTorch
Stars: ✭ 74 (-22.11%)
Mutual labels:  action-recognition
Okutama Action
Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection
Stars: ✭ 36 (-62.11%)
Mutual labels:  action-recognition
Resgcnv1
ResGCN: an efficient baseline for skeleton-based human action recognition.
Stars: ✭ 50 (-47.37%)
Mutual labels:  action-recognition
Tdn
[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition
Stars: ✭ 72 (-24.21%)
Mutual labels:  action-recognition
Video Classification 3d Cnn Pytorch
Video classification tools using 3D ResNet
Stars: ✭ 874 (+820%)
Mutual labels:  action-recognition
M Pact
A one stop shop for all of your activity recognition needs.
Stars: ✭ 85 (-10.53%)
Mutual labels:  action-recognition
Hcn Prototypeloss Pytorch
Hierarchical Co-occurrence Network with Prototype Loss for Few-shot Learning (PyTorch)
Stars: ✭ 17 (-82.11%)
Mutual labels:  action-recognition
Epic Kitchens 55 Action Models
EPIC-KITCHENS-55 baselines for Action Recognition
Stars: ✭ 68 (-28.42%)
Mutual labels:  action-recognition
Video Dataset Loading Pytorch
Generic PyTorch Dataset Implementation for Loading, Preprocessing and Augmenting Video Datasets
Stars: ✭ 92 (-3.16%)
Mutual labels:  action-recognition
Video classification pytorch
Video Classification based on PyTorch
Stars: ✭ 89 (-6.32%)
Mutual labels:  action-recognition
Daps
This repo allocate DAPs code of our ECCV 2016 publication
Stars: ✭ 74 (-22.11%)
Mutual labels:  action-recognition

3D ResNets for Action Recognition

This is the PyTorch code for the following papers:

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
arXiv preprint, arXiv:1711.09577, 2017.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes only training and testing on the ActivityNet and Kinetics datasets.
If you want to classify your videos using our pretrained models, use this code.

The PyTorch (python) version of this code is available here.
The PyTorch version includes additional models, such as pre-activation ResNet, Wide ResNet, ResNeXt, and DenseNet.

Citation

If you use this code or pre-trained models, please cite the following:

@article{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  journal={arXiv preprint},
  volume={arXiv:1711.09577},
  year={2017},
}

Pre-trained models

Pre-trained models are available at releases.

Requirements

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh
luarocks install json
  • FFmpeg, FFprobe
wget http://johnvansickle.com/ffmpeg/releases/ffmpeg-release-64bit-static.tar.xz
tar xvf ffmpeg-release-64bit-static.tar.xz
cd ./ffmpeg-3.3.3-64bit-static/; sudo cp ffmpeg ffprobe /usr/local/bin;
  • Python 3

Preparation

ActivityNet

  • Download datasets using official crawler codes
  • Convert from avi to jpg files using utils/video_jpg.py
python utils/video_jpg.py avi_video_directory jpg_video_directory
  • Generate fps files using utils/fps.py
python utils/fps.py avi_video_directory jpg_video_directory

Kinetics

  • Download datasets using official crawler codes
    • Locate test set in video_directory/test.
  • Convert from avi to jpg files using utils/video_jpg_kinetics.py
python utils/video_jpg_kinetics.py avi_video_directory jpg_video_directory
  • Generate n_frames files using utils/n_frames_kinetics.py
python utils/n_frames_kinetics.py jpg_video_directory
  • Generate annotation file in json format similar to ActivityNet using utils/kinetics_json.py
python utils/kinetics_json.py train_csv_path val_csv_path test_csv_path json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    activitynet_videos/
      jpg/
        .../ (directories of video names)
          ... (jpg files)
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    models/
      resnet.t7
    results/
      model_100.t7
    LR/
      ActivityNet/
        lr.lua
      Kinetics/
        lr.lua
    kinetics.json
    activitynet.json

Confirm all options.

th main.lua -h

Train ResNets-34 on the Kinetics dataset (400 classes) with 4 CPU threads (for data loading) and 2 GPUs.
Batch size is 128.
Save models at every 5 epochs.

th main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --lr_path LR/Kinetics/lr.lua --dataset kinetics --model resnet \
--resnet_depth 34 --n_classes 400 --batch_size 128 --n_gpu 2 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/model_100.t7 is loaded.)

th main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --lr_path LR/Kinetics/lr.lua --dataset kinetics --begin_epoch 101 \
--batch_size 128 --n_gpu 2 --n_threads 4 --checkpoint 5

Perform recognition for each video of validation set using pretrained model. This operation outputs top-10 labels for each video.

th main.lua --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --premodel_path models/resnet.t7 --dataset kinetics \
--no_train --no_val --test_video --test_subset val --n_gpu 2 --n_threads 4
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].