All Projects → kenziyuliu → Ms G3d

kenziyuliu / Ms G3d

Licence: other
[CVPR 2020 Oral] PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition"

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ms G3d

experiments on classifying actions using poses
Stars: ✭ 24 (-89.33%)
Mutual labels:  skeleton, action-recognition
No description or website provided.
Stars: ✭ 80 (-64.44%)
Mutual labels:  pretrained-models, action-recognition
Awesome Skeleton Based Action Recognition
Skeleton-based Action Recognition
Stars: ✭ 360 (+60%)
Mutual labels:  action-recognition, skeleton
A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Stars: ✭ 2,378 (+956.89%)
Mutual labels:  action-recognition
Skeleton Nova Tool
A skeleton repository for Spatie's Nova Packages
Stars: ✭ 191 (-15.11%)
Mutual labels:  skeleton
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (-4%)
Mutual labels:  pretrained-models
[BMVC 2018] iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection
Stars: ✭ 225 (+0%)
Mutual labels:  action-recognition
Skeleton view pattern for Android
Stars: ✭ 186 (-17.33%)
Mutual labels:  skeleton
Pytorch cifar10
Pretrained TorchVision models on CIFAR10 dataset (with weights)
Stars: ✭ 219 (-2.67%)
Mutual labels:  pretrained-models
Awesome Skeleton
skeleton generation tool
Stars: ✭ 212 (-5.78%)
Mutual labels:  skeleton
A no-framework application skeleton
Stars: ✭ 212 (-5.78%)
Mutual labels:  skeleton
Zf unet 224 pretrained model
Modification of convolutional neural net "UNET" for image segmentation in Keras framework
Stars: ✭ 195 (-13.33%)
Mutual labels:  pretrained-models
Ig65m Pytorch
PyTorch 3D video classification models pre-trained on 65 million Instagram videos
Stars: ✭ 217 (-3.56%)
Mutual labels:  action-recognition
Yolov3 Object Detection With Opencv
This project implements a real-time image and video object detection classifier using pretrained yolov3 models.
Stars: ✭ 191 (-15.11%)
Mutual labels:  pretrained-models
Comprehensive, latest, and deployable video deep learning algorithm, including video recognition, action localization, and temporal action detection tasks. It's a high-performance, light-weight codebase provides practical models for video understanding research and application
Stars: ✭ 218 (-3.11%)
Mutual labels:  action-recognition
Optical Flow Guided Feature
Implementation Code of the paper Optical Flow Guided Feature, CVPR 2018
Stars: ✭ 186 (-17.33%)
Mutual labels:  action-recognition
[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)
Stars: ✭ 217 (-3.56%)
Mutual labels:  action-recognition
Taro Listview
taro框架长列表方案 :集成下拉刷新、骨架屏、无限滚动、图片懒加载;
Stars: ✭ 197 (-12.44%)
Mutual labels:  skeleton
Awesome Pretrained Chinese Nlp Models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型集合
Stars: ✭ 195 (-13.33%)
Mutual labels:  pretrained-models
Template your base files and generate new projects from Git(Hub).
Stars: ✭ 213 (-5.33%)
Mutual labels:  skeleton


PyTorch implementation of "Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition", CVPR 2020 Oral.



  • Python >= 3.6
  • PyTorch >= 1.2.0
  • NVIDIA Apex (auto mixed precision training)
  • PyYAML, tqdm, tensorboardX

Data Preparation

Disk usage warning: after preprocessing, the total sizes of datasets are around 38GB, 77GB, 63GB for NTU RGB+D 60, NTU RGB+D 120, and Kinetics 400, respectively. The raw/intermediate sizes may be larger.

Download Datasets

There are 3 datasets to download:

  • NTU RGB+D 60 Skeleton
  • NTU RGB+D 120 Skeleton
  • Kinetics 400 Skeleton

NTU RGB+D 60 and 120

  1. Request dataset here:

  2. Download the skeleton-only datasets:

    • (NTU RGB+D 60)
    • (NTU RGB+D 120, on top of NTU RGB+D 60)
    • Total size should be 5.8GB + 4.5GB.
  3. Download missing skeletons lookup files from the authors' GitHub repo:

    • NTU RGB+D 60 Missing Skeletons: wget

    • NTU RGB+D 120 Missing Skeletons: wget

    • Remember to remove the first few lines of text in these files!

Kinetics Skeleton 400

  1. Download dataset from ST-GCN repo:
  2. This might be useful if you want to wget the dataset from Google Drive

Data Preprocessing

Directory Structure

Put downloaded data into the following directory structure:

- data/
  - kinetics_raw/
    - kinetics_train/
    - kinetics_val/
    - kinetics_train_label.json
    - keintics_val_label.json
  - nturgbd_raw/
    - nturgb+d_skeletons/     # from ``
    - nturgb+d_skeletons120/  # from ``
    - NTU_RGBD_samples_with_missing_skeletons.txt
    - NTU_RGBD120_samples_with_missing_skeletons.txt

Generating Data

  1. NTU RGB+D

    • cd data_gen
    • python3
    • python3
    • Time estimate is ~ 3hrs to generate NTU 120 on a single core (feel free to parallelize the code :))
  2. Kinetics

    • python3
    • ~ 70 mins to generate Kinetics data
  3. Generate the bone data with:

    • python --dataset ntu
    • python --dataset ntu120
    • python --dataset kinetics

Pretrained Models

  • Download pretrained models for producing the final results on NTU RGB+D 60, NTU RGB+D 120, Kinetics Skeleton 400: [Dropbox][GoogleDrive][WeiYun]

  • Put the folder of pretrained models at repo root:

- MS-G3D/
  - pretrained-models/
  - ...
  • Run bash

Training & Testing

  • The general training template command:
  --config <config file>
  --work-dir <place to keep things (weights, checkpoints, logs)>
  --device <GPU IDs to use>
  --half   # Mixed precision training with NVIDIA Apex (default O1) for GPUs ~11GB memory
  [--base-lr <base learning rate>]
  [--batch-size <batch size>]
  [--weight-decay <weight decay>]
  [--forward-batch-size <batch size during forward pass, useful if using only 1 GPU>]
  [--eval-start <which epoch to start evaluating the model>]
  • The general testing template command:
  --config <config file>
  --work-dir <place to keep things>
  --device <GPU IDs to use>
  --weights <path to model weights>
  [--test-batch-size <...>]
  • Template for joint-bone two-stream fusion:
  --dataset <dataset to ensemble, e.g. ntu120/xsub>
  --joint-dir <work_dir of your test command for joint model>
  --bone-dir <work_dir of your test command for bone model>
  • Use the corresponding config files from ./config to train/test different datasets

  • Examples

    • Train on NTU 120 XSub Joint
      • Train with 1 GPU:
        • python3 --config ./config/nturgbd120-cross-subject/train_joint.yaml
      • Train with 2 GPUs:
        • python3 --config ./config/nturgbd120-cross-subject/train_joint.yaml --batch-size 32 --forward-batch-size 32 --device 0 1
    • Test on NTU 120 XSet Bone
      • python3 --config ./config/nturgbd120-cross-setup/test_bone.yaml
    • Batch size 32 on 1 GPU (BS 16 per forward pass by accumulating gradients):
      • python3 --config <...> --batch-size 32 --forward-batch-size 16 --device 0
  • Resume training from checkpoint

  ...  # Same params as before
  --start-epoch <0 indexed epoch>
  --weights <weights in work_dir>
  --checkpoint <checkpoint in work_dir>


  • It's recommended to linearly scale up base LR with > 2 GPUs (, Section 2.1) to use 16 samples per worker during training; e.g.

    • 1 GPU: --base-lr 0.05 --device 0 --batch-size 32 --forward-batch-size 16
    • 2 GPUs: --base-lr 0.05 --device 0 1 --batch-size 32 --forward-batch-size 32
    • 4 GPUs: --base-lr 0.1 --device 0 1 2 3 --batch-size 64 --forward-batch-size 64
  • Unfortunately, different PyTorch/CUDA versions & GPU setups can cause different levels of memory usage, and so you may experience out of memory (OOM) on some machines but not others

    • 1080Ti GPUs with --half and --amp-opt-level 1 (default) are relatively more stable
  • If OOM occurs, try using Apex O2 by setting --amp-opt-level 2. However, note that

  • Default hyperparameters are stored in the config files; you can tune them & add extra training techniques to boost performance

  • The best joint-bone fusion result may not come from the best single stream models; for example, we provided 3 pretrained models for NTU RGB+D 60 XSub joint stream where the best fusion performance comes from the slightly underperforming model (~89.3%) instead of the reported (~89.4%) and the slightly better retrained model (~89.6%).


This repo is based on

Thanks to the original authors for their work!


Please cite this work if you find it useful:

  title={Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition},
  author={Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


Please email kenziyuliu AT for further questions

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].