All Projects → UofT-EcoSystem → hfta

UofT-EcoSystem / hfta

Licence: MIT License
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to hfta

Adanet
Fast and flexible AutoML with learning guarantees.
Stars: ✭ 3,340 (+18455.56%)
Mutual labels:  gpu, tpu
warp
continuous energy monte carlo neutron transport in general geometries on GPUs
Stars: ✭ 27 (+50%)
Mutual labels:  gpu
peakperf
Achieve peak performance on x86 CPUs and NVIDIA GPUs
Stars: ✭ 33 (+83.33%)
Mutual labels:  gpu
cuda memtest
Fork of CUDA GPU memtest 👓
Stars: ✭ 68 (+277.78%)
Mutual labels:  gpu
lambdacube-quake3
Quake 3 map viewer in Haskell using LambdaCube 3D
Stars: ✭ 66 (+266.67%)
Mutual labels:  gpu
Fat-Clouds
GPU Fluid Simulation with Volumetric Rendering
Stars: ✭ 81 (+350%)
Mutual labels:  gpu
BifurcationKit.jl
A Julia package to perform Bifurcation Analysis
Stars: ✭ 185 (+927.78%)
Mutual labels:  gpu
SilentXMRMiner
A Silent (Hidden) Monero (XMR) Miner Builder
Stars: ✭ 417 (+2216.67%)
Mutual labels:  gpu
DistributedDeepLearning
Distributed Deep Learning using AzureML
Stars: ✭ 36 (+100%)
Mutual labels:  gpu
FLAMEGPU2
FLAME GPU 2 is a GPU accelerated agent based modelling framework for C++ and Python
Stars: ✭ 25 (+38.89%)
Mutual labels:  gpu
HybridBackend
Efficient training of deep recommenders on cloud.
Stars: ✭ 30 (+66.67%)
Mutual labels:  gpu
briefmatch
BriefMatch real-time GPU optical flow
Stars: ✭ 36 (+100%)
Mutual labels:  gpu
XLearning-GPU
qihoo360 xlearning with GPU support; AI on Hadoop
Stars: ✭ 22 (+22.22%)
Mutual labels:  gpu
gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI
Stars: ✭ 227 (+1161.11%)
Mutual labels:  gpu
docker-nvidia-glx-desktop
MATE Desktop container designed for Kubernetes supporting OpenGL GLX and Vulkan for NVIDIA GPUs with WebRTC and HTML5, providing an open source remote cloud graphics or game streaming platform. Spawns its own fully isolated X Server instead of using the host X server, therefore not requiring /tmp/.X11-unix host sockets or host configuration.
Stars: ✭ 47 (+161.11%)
Mutual labels:  gpu
ELM-pytorch
Extreme Learning Machine implemented in Pytorch
Stars: ✭ 68 (+277.78%)
Mutual labels:  gpu
monolish
monolish: MONOlithic LInear equation Solvers for Highly-parallel architecture
Stars: ✭ 166 (+822.22%)
Mutual labels:  gpu
GPUCompiler.jl
Reusable compiler infrastructure for Julia GPU backends.
Stars: ✭ 67 (+272.22%)
Mutual labels:  gpu
MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
Stars: ✭ 418 (+2222.22%)
Mutual labels:  gpu
pyamgx
GPU accelerated multigrid library for Python
Stars: ✭ 29 (+61.11%)
Mutual labels:  gpu

Horizontally Fused Training Array

Logo


Horizontally Fused Training Array (HFTA) is a PyTorch extension library that helps machine learning and deep learning researchers and practitioners to develop horizontally fused models. Each fused model is functionally/mathematically equivalent to an array of models with similar/same operators.

Why developing horizontally fused models at all, you ask? This is because sometimes training a certain class of models can under-utilize the underlying accelerators. Such hardware under-utilization could then be greatly amplified if you train this class of models repetitively (e.g., when you tune its hyper-parameters). Fortunately, in such use cases, the models under repetitive training often have the same types of operators with the same shapes (e.g., think about what happens to the operators when you adjust the learning rate). Therefore, with HFTA, you can improve the hardware utilization by training an array of models (as a single fused model) on the same accelerator at the same time.

HFTA is device-agnostic. So far, we tested HFTA and observed significant training performance and hardware utilization improvements on NVIDIA V100, RTX6000 and A100 GPUs and Google Cloud TPU v3.

Installation

From Source

# NVIDIA GPUs:
$ pip install git+https://github.com/UofT-EcoSystem/hfta

# Google Cloud TPU v3:
$ pip install git+https://github.com/UofT-EcoSystem/hfta#egg=hfta[xla]

From PyPI

TODO

Testing the Installation

  1. Clone the HFTA's repo.

    # Clone the repo (mainly for fetching the benchmark code)
    $ git clone https://github.com/UofT-EcoSystem/hfta
  2. Run the MobileNet-V2 example without HFTA.

    # NVIDIA GPUs:
    $ python hfta/examples/mobilenet/main.py --version v2 --epochs 5 --amp --eval --dataset cifar10 --device cuda --lr 0.01
    
    # Google Cloud TPU v3:
    $ python hfta/examples/mobilenet/main.py --version v2 --epochs 5 --amp --eval --dataset cifar10 --device xla --lr 0.01
    
    # The following output is captured on V100:
    Enable cuDNN heuristics!
    Files already downloaded and verified
    Files already downloaded and verified
    Epoch 0 took 7.802547454833984 s!
    Epoch 1 took 5.990707635879517 s!
    Epoch 2 took 6.000213623046875 s!
    Epoch 3 took 6.0167365074157715 s!
    Epoch 4 took 6.071732521057129 s!
    Running validation loop ...
  3. Run the same MobileNet-V2 example with HFTA, testing three learning rates on the same accelerator simultaneously.

    # NVIDIA GPUs:
    $ python hfta/examples/mobilenet/main.py --version v2 --epochs 5 --amp --eval --dataset cifar10 --device cuda --lr 0.01 0.03 0.1 --hfta
    
    # Google Cloud TPU v3:
    $ python hfta/examples/mobilenet/main.py --version v2 --epochs 5 --amp --eval --dataset cifar10 --device xla --lr 0.01 0.03 0.1 --hfta
    
    # The following output is captured on V100:
    Enable cuDNN heuristics!
    Files already downloaded and verified
    Files already downloaded and verified
    Epoch 0 took 13.595093727111816 s!
    Epoch 1 took 7.609431743621826 s!
    Epoch 2 took 7.635211229324341 s!
    Epoch 3 took 7.6383607387542725 s!
    Epoch 4 took 7.7035486698150635 s!

In the above example, ideally, the end-to-end training time for MobileNet-V2 with HFTA should be much less than three times the end-to-end training time without HFTA.

Getting Started

Check the colab tutorial

Open In Colab

Publication

Citation

If you use HFTA in your work, please cite our MLSys'21 publication using the following BibTeX:

@inproceedings{MLSYS2021_HFTA,
 author = {Wang, Shang and Yang, Peiming and Zheng, Yuxuan and Li, Xin and Pekhimenko, Gennady},
 booktitle = {Proceedings of Machine Learning and Systems},
 editor = {A. Smola and A. Dimakis and I. Stoica},
 pages = {599--623},
 title = {Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models},
 url = {https://proceedings.mlsys.org/paper/2021/file/a97da629b098b75c294dffdc3e463904-Paper.pdf},
 volume = {3},
 year = {2021}
}

Contributing

We sincerely appreciate contributions! We are currently working on the contributor guidelines. For now, just send us a PR for review!

License

HFTA itself has a MIT License. When collecting the examples and benchmarks, we leverage other open-sourced projects, and we include their licenses in their corresponding directories.

Authors

HFTA is developed and maintained by Shang Wang (@wangshangsam), Peiming Yang (@ypm1999), Yuxuan (Eric) Zheng (@eric-zheng), Xin Li (@nixli).

HFTA is one of the research projects from the EcoSystem group at the University of Toronto, Department of Computer Science.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].