Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → MCG-NJU → Tdn

MCG-NJU / Tdn

Licence: apache-2.0

[CVPR 2021] TDN: Temporal Difference Networks for Efficient Action Recognition

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch action-recognition video-understanding

Projects that are alternatives of or similar to Tdn

Awesome Activity Prediction

Paper list of activity prediction and related area

Stars: ✭ 147 (+104.17%)

Mutual labels: action-recognition, video-understanding

Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application

Stars: ✭ 218 (+202.78%)

Mutual labels: action-recognition, video-understanding

STEP: Spatio-Temporal Progressive Learning for Video Action Detection. CVPR'19 (Oral)

Stars: ✭ 196 (+172.22%)

Mutual labels: action-recognition, video-understanding

Tools for movie and video research

Stars: ✭ 113 (+56.94%)

Mutual labels: action-recognition, video-understanding

Awesome Action Recognition

A curated list of action recognition and related area resources

Stars: ✭ 3,202 (+4347.22%)

Mutual labels: action-recognition, video-understanding

An open-source toolbox for action understanding based on PyTorch

Stars: ✭ 1,711 (+2276.39%)

Mutual labels: action-recognition, video-understanding

Temporal Segment Networks (TSN) in PyTorch

Stars: ✭ 895 (+1143.06%)

Mutual labels: action-recognition, video-understanding

TensorFlow code for finetuning I3D model on UCF101.

Stars: ✭ 128 (+77.78%)

Mutual labels: action-recognition, video-understanding

[ICCV 2021 Oral] Deep Evidential Action Recognition

Stars: ✭ 36 (-50%)

Mutual labels: action-recognition, video-understanding

DIN-Group-Activity-Recognition-Benchmark

A new codebase for Group Activity Recognition. It contains codes for ICCV 2021 paper: Spatio-Temporal Dynamic Inference Network for Group Activity Recognition and some other methods.

Stars: ✭ 26 (-63.89%)

Mutual labels: action-recognition, video-understanding

Temporal Segment Networks

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016

Stars: ✭ 1,287 (+1687.5%)

Mutual labels: action-recognition, video-understanding

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Stars: ✭ 684 (+850%)

Mutual labels: action-recognition, video-understanding

ActionVLAD for video action classification (CVPR 2017)

Stars: ✭ 217 (+201.39%)

Mutual labels: action-recognition, video-understanding

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Stars: ✭ 38 (-47.22%)

Mutual labels: action-recognition, video-understanding

Video Understanding Dataset

A collection of recent video understanding datasets, under construction!

Stars: ✭ 387 (+437.5%)

Mutual labels: action-recognition, video-understanding

Action Detection

temporal action detection with SSN

Stars: ✭ 597 (+729.17%)

Mutual labels: action-recognition, video-understanding

Video Classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101

Stars: ✭ 543 (+654.17%)

Mutual labels: action-recognition

Action Recognition Using 3d Resnet

Use 3D ResNet to extract features of UCF101 and HMDB51 and then classify them.

Stars: ✭ 32 (-55.56%)

Mutual labels: action-recognition

Gluon CV Toolkit

Stars: ✭ 5,001 (+6845.83%)

Mutual labels: action-recognition

Two Stream Pytorch

PyTorch implementation of two-stream networks for video action recognition

Stars: ✭ 428 (+494.44%)

Mutual labels: action-recognition

View All Similar Projects ➔

TDN: Temporal Difference Networks for Efficient Action Recognition (CVPR 2021)

Overview

We release the PyTorch code of the TDN(Temporal Difference Networks). This code is based on the TSN and TSM codebase. The core code to implement the Temporal Difference Module are ops/base_module.py and ops/tdn_net.py.

TL; DR. We generalize the idea of RGB difference to devise an efficient temporal difference module (TDM) for motion modeling in videos, and provide an alternative to 3D convolutions by systematically presenting principled and detailed module design.

[Mar 5, 2021] TDN has been accepted by CVPR 2021.

[Dec 26, 2020] We have released the PyTorch code of TDN.

Prerequisites
Data Preparation
Model Zoo
Testing
Training

Prerequisites

The code is built with following libraries:

Python 3.6 or higher
PyTorch 1.4 or higher
Torchvision
TensorboardX
tqdm
scikit-learn
ffmpeg
decord

Data Preparation

We have successfully trained TDN on Kinetics400, UCF101, HMDB51, Something-Something-V1 and V2 with this codebase.

The processing of Something-Something-V1 & V2 can be summarized into 3 steps:
1. Extract frames from videos(you can use ffmpeg to get frames from video)
2. Generate annotations needed for dataloader ("<path_to_frames> <frames_num> <video_class>" in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
```
dataset_root/frames/video_1 num_frames label_1
dataset_root/frames/video_2 num_frames label_2
dataset_root/frames/video_3 num_frames label_3
...
dataset_root/frames/video_N num_frames label_N
```
3. Add the information to ops/dataset_configs.py.
The processing of Kinetics400 can be summarized into 2 steps:
1. Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations) The annotation usually includes train.txt and val.txt. The format of *.txt file is like:
```
dataset_root/video_1.mp4  label_1
dataset_root/video_2.mp4  label_2
dataset_root/video_3.mp4  label_3
...
dataset_root/video_N.mp4  label_N
```
2. Add the information to ops/dataset_configs.py.

Model Zoo

Here we provide some off-the-shelf pretrained models. The accuracy might vary a little bit compared to the paper, since the raw video of Kinetics downloaded by users may have some differences.

Something-Something-V1

Model	Frames x Crops x Clips	Top-1	Top-5	checkpoint
TDN-ResNet50	8x1x1	52.3%	80.6%	link
TDN-ResNet50	16x1x1	53.9%	82.1%	link

Something-Something-V2

Model	Frames x Crops x Clips	Top-1	Top-5	checkpoint
TDN-ResNet50	8x1x1	64.0%	88.8%	link
TDN-ResNet50	16x1x1	65.3%	89.7%	link

Kinetics400

Model	Frames x Crops x Clips	Top-1 (30 view)	Top-5 (30 view)	checkpoint
TDN-ResNet50	8x3x10	76.6%	92.8%	link
TDN-ResNet50	16x3x10	77.5%	93.2%	link
TDN-ResNet101	8x3x10	77.5%	93.6%	link
TDN-ResNet101	16x3x10	78.5%	93.9%	link

Testing

For center crop single clip, the processing of testing can be summarized into 2 steps:

Run the following testing scripts:

CUDA_VISIBLE_DEVICES=0 python3 test_models_center_crop.py something \
--archs='resnet50' --weights <your_checkpoint_path>  --test_segments=8  \
--test_crops=1 --batch_size=16  --gpus 0 --output_dir <your_pkl_path> -j 4 --clip_index=0

Run the following scripts to get result from the raw score:

python3 pkl_to_results.py --num_clips 1 --test_crops 1 --output_dir <your_pkl_path>

For 3 crops, 10 clips, the processing of testing can be summarized into 2 steps:

Run the following testing scripts for 10 times(clip_index from 0 to 9):

CUDA_VISIBLE_DEVICES=0 python3 test_models_three_crops.py  kinetics \
--archs='resnet50' --weights <your_checkpoint_path>  --test_segments=8 \
--test_crops=3 --batch_size=16 --full_res --gpus 0 --output_dir <your_pkl_path>  \
-j 4 --clip_index <your_clip_index>

Run the following scripts to ensemble the raw score of the 30 views:

python pkl_to_results.py --num_clips 10 --test_crops 3 --output_dir <your_pkl_path>

Training

This implementation supports multi-gpu, DistributedDataParallel training, which is faster and simpler.

For example, to train TDN-ResNet50 on Something-Something-V1 with 8 gpus, you can run:

python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
            main.py  something  RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.01 \
            --lr_scheduler step --lr_steps  30 45 55 --epochs 60 --batch-size 8 \
            --wd 5e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb

For example, to train TDN-ResNet50 on Kinetics400 with 8 gpus, you can run:

python -m torch.distributed.launch --master_port 12347 --nproc_per_node=8 \
        main.py  kinetics RGB --arch resnet50 --num_segments 8 --gd 20 --lr 0.02 \
        --lr_scheduler step  --lr_steps 50 75 90 --epochs 100 --batch-size 16 \
        --wd 1e-4 --dropout 0.5 --consensus_type=avg --eval-freq=1 -j 4 --npb

Acknowledgements

We especially thank the contributors of the TSN and TSM codebase for providing helpful code.

License

This repository is released under the Apache-2.0. license as found in the LICENSE file.

Citation

If you think our work is useful, please feel free to cite our paper 😆 :

@article{wang2020tdn,
      title={TDN: Temporal Difference Networks for Efficient Action Recognition}, 
      author={Limin Wang and Zhan Tong and Bin Ji and Gangshan Wu},
      journal={arXiv preprint arXiv:2012.10071},
      year={2020}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 72

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗