All Projects → kenshohara → 3d Resnets Pytorch

kenshohara / 3d Resnets Pytorch

Licence: mit
3D ResNets for Action Recognition (CVPR 2018)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to 3d Resnets Pytorch

conv3d-video-action-recognition
My experimentation around action recognition in videos. Contains Keras implementation for C3D network based on original paper "Learning Spatiotemporal Features with 3D Convolutional Networks", Tran et al. and it includes video processing pipelines coded using mPyPl package. Model is being benchmarked on popular UCF101 dataset and achieves result…
Stars: ✭ 50 (-98.42%)
Mutual labels:  action-recognition, video-recognition
MiCT-Net-PyTorch
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
Stars: ✭ 48 (-98.49%)
Mutual labels:  action-recognition, video-recognition
Awesome Action Recognition
A curated list of action recognition and related area resources
Stars: ✭ 3,202 (+1.04%)
Mutual labels:  action-recognition, video-recognition
ViCC
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.
Stars: ✭ 33 (-98.96%)
Mutual labels:  action-recognition, video-recognition
DLCV2018SPRING
Deep Learning for Computer Vision (CommE 5052) in NTU
Stars: ✭ 38 (-98.8%)
Mutual labels:  action-recognition
Dataset-REPAIR
REPresentAtion bIas Removal (REPAIR) of datasets
Stars: ✭ 49 (-98.45%)
Mutual labels:  action-recognition
san
The official PyTorch implementation of "Context Matters: Self-Attention for sign Language Recognition"
Stars: ✭ 17 (-99.46%)
Mutual labels:  action-recognition
TCFPN-ISBA
Temporal Convolutional Feature Pyramid Network (TCFPN) & Iterative Soft Boundary Assignment (ISBA), CVPR '18
Stars: ✭ 40 (-98.74%)
Mutual labels:  action-recognition
DIN-Group-Activity-Recognition-Benchmark
A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.
Stars: ✭ 26 (-99.18%)
Mutual labels:  action-recognition
auditory-slow-fast
Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch
Stars: ✭ 46 (-98.55%)
Mutual labels:  action-recognition
torch-lrcn
An implementation of the LRCN in Torch
Stars: ✭ 85 (-97.32%)
Mutual labels:  action-recognition
two-stream-fusion-for-action-recognition-in-videos
No description or website provided.
Stars: ✭ 80 (-97.48%)
Mutual labels:  action-recognition
GST-video
ICCV 19 Grouped Spatial-Temporal Aggretation for Efficient Action Recognition
Stars: ✭ 40 (-98.74%)
Mutual labels:  action-recognition
video repres mas
code for CVPR-2019 paper: Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Stars: ✭ 63 (-98.01%)
Mutual labels:  action-recognition
YAPO-e-plus
YAPO e+ - Yet Another Porn Organizer (extended)
Stars: ✭ 92 (-97.1%)
Mutual labels:  video-recognition
theWorldInSafety
Surveillance System Against Violence
Stars: ✭ 31 (-99.02%)
Mutual labels:  action-recognition
cpnet
Learning Video Representations from Correspondence Proposals (CVPR 2019 Oral)
Stars: ✭ 93 (-97.07%)
Mutual labels:  action-recognition
vlog action recognition
Identifying Visible Actions in Lifestyle Vlogs
Stars: ✭ 13 (-99.59%)
Mutual labels:  action-recognition
TadTR
End-to-end Temporal Action Detection with Transformer. [Under review for a journal publication]
Stars: ✭ 55 (-98.26%)
Mutual labels:  action-recognition
Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Stars: ✭ 20 (-99.37%)
Mutual labels:  action-recognition

3D ResNets for Action Recognition

Update (2020/4/13)

We published a paper on arXiv.

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

We uploaded the pretrained models described in this paper including ResNet-50 pretrained on the combined dataset with Kinetics-700 and Moments in Time.

Update (2020/4/10)

We significantly updated our scripts. If you want to use older versions to reproduce our CVPR2018 paper, you should use the scripts in the CVPR2018 branch.

This update includes as follows:

  • Refactoring whole project
  • Supporting the newer PyTorch versions
  • Supporting distributed training
  • Supporting training and testing on the Moments in Time dataset.
  • Adding R(2+1)D models
  • Uploading 3D ResNet models trained on the Kinetics-700, Moments in Time, and STAIR-Actions datasets

Summary

This is the PyTorch code for the following papers:

Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, and Yutaka Satoh,
"Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs",
arXiv preprint, arXiv:2004.04968, 2020.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Towards Good Practice for Action Recognition with Spatiotemporal 3D Convolutions",
Proceedings of the International Conference on Pattern Recognition, pp. 2516-2521, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?",
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546-6555, 2018.

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh,
"Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition",
Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017.

This code includes training, fine-tuning and testing on Kinetics, Moments in Time, ActivityNet, UCF-101, and HMDB-51.

Citation

If you use this code or pre-trained models, please cite the following:

@inproceedings{hara3dcnns,
  author={Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh},
  title={Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={6546--6555},
  year={2018},
}

Pre-trained models

Pre-trained models are available here.
All models are trained on Kinetics-700 (K), Moments in Time (M), STAIR-Actions (S), or merged datasets of them (KM, KS, MS, KMS).
If you want to finetune the models on your dataset, you should specify the following options.

r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

Old pretrained models are still available here.
However, some modifications are required to use the old pretrained models in the current scripts.

Requirements

conda install pytorch torchvision cudatoolkit=10.1 -c soumith
  • FFmpeg, FFprobe

  • Python 3

Preparation

ActivityNet

  • Download videos using the official crawler.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet
  • Add fps infomartion into the json file util_scripts/add_fps_into_activitynet_json.py
python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

Kinetics

  • Download videos using the official crawler.
    • Locate test set in video_directory/test.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics
  • Generate annotation file in json format similar to ActivityNet using util_scripts/kinetics_json.py
    • The CSV files (kinetics_{train, val, test}.csv) are included in the crawler.
python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101
  • Generate annotation file in json format similar to ActivityNet using util_scripts/ucf101_json.py
    • annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

  • Download videos and train/test splits here.
  • Convert from avi to jpg files using util_scripts/generate_video_jpgs.py
python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51
  • Generate annotation file in json format similar to ActivityNet using util_scripts/hmdb51_json.py
    • annotation_dir_path includes brush_hair_test_split1.txt, ...
python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

Running the code

Assume the structure of data directories is the following:

~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

Confirm all options.

python main.py -h

Train ResNets-50 on the Kinetics-700 dataset (700 classes) with 4 CPU threads (for data loading).
Batch size is 128.
Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES=....

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --model resnet \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Continue Training from epoch 101. (~/data/results/save_100.pth is loaded.)

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_100.pth \
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json --subset val -k 1 --ignore

Fine-tune fc layers of a pretrained model (~/data/models/resnet-50-kinetics.pth) on UCF-101.

python main.py --root_path ~/data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json \
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 \
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc \
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].