All Projects → kkahatapitiya → X3D-Multigrid

kkahatapitiya / X3D-Multigrid

Licence: MIT License
PyTorch implementation of X3D models with Multigrid training.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to X3D-Multigrid

pyamgx
GPU accelerated multigrid library for Python
Stars: ✭ 29 (-47.27%)
Mutual labels:  multigrid
Tutorials
Tutorials for creating figures, tables, or other content for AAS Journals.
Stars: ✭ 35 (-36.36%)
Mutual labels:  x3d
demo-models
Demo 3D models (mostly in X3D and VRML formats) of view3dscene and Castle Game Engine
Stars: ✭ 15 (-72.73%)
Mutual labels:  x3d
STP2X3D
Translator from STEP format to X3D format
Stars: ✭ 36 (-34.55%)
Mutual labels:  x3d
FKD
A Fast Knowledge Distillation Framework for Visual Recognition
Stars: ✭ 49 (-10.91%)
Mutual labels:  efficient-training
hydro examples
Simple one-dimensional examples of various hydrodynamics techniques
Stars: ✭ 83 (+50.91%)
Mutual labels:  multigrid
titania
Titania X3D Editor
Stars: ✭ 31 (-43.64%)
Mutual labels:  x3d
x ite
X_ITE X3D WebGL Browser
Stars: ✭ 40 (-27.27%)
Mutual labels:  x3d
exadg
ExaDG - High-Order Discontinuous Galerkin for the Exa-Scale
Stars: ✭ 62 (+12.73%)
Mutual labels:  multigrid
raptor
General, high performance algebraic multigrid solver
Stars: ✭ 50 (-9.09%)
Mutual labels:  multigrid
numpy-vs-mir
Multigrid benchmark between Dlang's Mir library and Python's numpy
Stars: ✭ 19 (-65.45%)
Mutual labels:  multigrid
HOT
Hierarchical Optimization Time Integration (HOT) for efficient implicit timestepping of the material point method (MPM)
Stars: ✭ 83 (+50.91%)
Mutual labels:  multigrid
L2-GCN
[CVPR 2020] L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks
Stars: ✭ 26 (-52.73%)
Mutual labels:  efficient-training
Openjscad.org
JSCAD is an open source set of modular, browser and command line tools for creating parametric 2D and 3D designs with JavaScript code. It provides a quick, precise and reproducible method for generating 3D models, and is especially useful for 3D printing applications.
Stars: ✭ 1,851 (+3265.45%)
Mutual labels:  x3d

PyTorch Implementation of X3D with Multigrid Training

This repository contains a PyTorch implementation for "X3D: Expanding Architectures for Efficient Video Recognition models" [CVPR2020] with "A Multigrid Method for Efficiently Training Video Models" [CVPR2020]. In contrast to the original repository (here) by FAIR, this repository provides a simpler, less modular and more familiar structure of implementation for faster and easier adoptation.

Introduction

X3D is an efficient video architecture, searched/optimized for learning video representations. Here, the author expands a tiny base network along axes: space and time (of the input), width and depth (of the network), optimizing for the performace at a given complexity (params/FLOPs). It further relies on depthwise-separable 3D convolutions [1], inverted-bottlenecks in residual blocks [2], squeeze-and-excitation blocks [3], swish (soft) activations [4] and sparse clip sampling (at inference) to improve its efficiency.

Multigrid training is a mechanism to train video architectures efficiently. Instead of using a fixed batch size for training, this method proposes to use varying batch sizes in a defined schedule, yet keeping the computational budget approximately unchaged by keeping batch x time x height x width a constant. Hence, this follows a coarse-to-fine training process by having lower spatio-temporal resolutions at higher batch sizes and vice-versa. In contrast to conventioanl training with a fixed batch size, Multigrid training benefit from 'seeing' more inputs during a training schedule at approximately the same computaional budget.

Our implementaion achieves 62.62% Top-1 accuracy (3-view) on Kinetics-400 when trained for ~200k iterations from scratch (a 4x shorter schedule compared to the original, when adjusted with the linear scaling rule [5]), which takes only ~2.8 days on 4 Titan RTX GPUs. This is much faster than previous Kinetics-400 training schedules on a single machine. Longer schedules can achieve SOTA results. We port and include the weights trained by FAIR for a longer schedule on 128 GPUs, which achieves 71.48% Top-1 accuracy (3-view) on Kinetics-400. This can be used for fine-tuning on other datasets. For instance, we can train on Charades classification (35.01% mAP) and localization (17.71% mAP) within a few hours on 2 Titan RTX GPUs. All models and training logs are included in the repository.

Note: the Kinetics-400 dataset that we trained on contains ~220k (~240k) training and ~17k (~20k) validation clips compared to (original dataset) due to availability.

Tips and Tricks

  • 3D depthwise-separable convolutions are slow in current PyTorch releases as identified by FAIR. Make sure to build from source with this fix. Only a few files are changed, this can be manually edited easily in the version of the source you use. In our setting, this fix reduced the training time from ~4 days to ~2.8 days.

  • In my experience, dataloading and preprocessing speeds are as follows: accimage ≈ Pillow-SIMD >> Pillow > OpenCV. This is not formally verified by me, but check here for some benchmarks.

  • Use the linear scaling rule [5] to adjust the learning rate and training schedule when using a different base batch size.

  • For longer schedules, enable random spatial scaling, and use the original temporal stride (we use 2x stride in the shorter schedule).

Dependencies

  • Python 3.7.6
  • PyTorch 1.7.0 (built from source, with this fix). This issue is fixed in PyTorch >= 1.9 releases.
  • torchvision 0.8.0 (built from source)
  • accimage 0.1.1
  • pkbar 0.5

Quick Start

Edit the Dataset directories to fit yours, adjust the learning rate and the schedule, and,

  • Use python train_x3d_kinetics_multigrid.py -gpu 0,1,2,3 for training on Kinetics-400.
  • Use python train_x3d_charades.py -gpu 0,1 for training on Charades classification.
  • Use python train_x3d_charades_loc.py -gpu 0,1 for training on Charades localization.

Charades dataset can be found here. Kinetics-400 data is only partially available on YouTube now. Use annotations here. I would recommend this repo for downloading Kinetics data. If you want access to our Kinetics-400 data (~220k training and ~17k validation), please drop me an email.

Reference

If you find this work useful, please consider citing the original authors:

@inproceedings{feichtenhofer2020x3d,
  title={X3D: Expanding Architectures for Efficient Video Recognition},
  author={Feichtenhofer, Christoph},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={203--213},
  year={2020}
}

@inproceedings{wu2020multigrid,
  title={A Multigrid Method for Efficiently Training Video Models},
  author={Wu, Chao-Yuan and Girshick, Ross and He, Kaiming and Feichtenhofer, Christoph and Krahenbuhl, Philipp},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={153--162},
  year={2020}
}

Acknowledgements

I would like to thank the original authors for their work. Also, I thank AJ Piergiovanni for sharing his Multigrid implementation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].