All Projects → PKU-DAIR → Hetu

PKU-DAIR / Hetu

Licence: Apache-2.0 License
A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language
CMake
9771 projects
shell
77523 projects

Projects that are alternatives of or similar to Hetu

Speedtorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
Stars: ✭ 615 (+688.46%)
Mutual labels:  gpu, embeddings
Megengine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Stars: ✭ 4,081 (+5132.05%)
Mutual labels:  gpu, autograd
Adanet
Fast and flexible AutoML with learning guarantees.
Stars: ✭ 3,340 (+4182.05%)
Mutual labels:  gpu, distributed-training
Pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stars: ✭ 52,811 (+67606.41%)
Mutual labels:  gpu, autograd
Awesome Distributed Deep Learning
A curated list of awesome Distributed Deep Learning resources.
Stars: ✭ 277 (+255.13%)
Mutual labels:  distributed-systems, gpu
Recommender-Systems-with-Collaborative-Filtering-and-Deep-Learning-Techniques
Implemented User Based and Item based Recommendation System along with state of the art Deep Learning Techniques
Stars: ✭ 41 (-47.44%)
Mutual labels:  embeddings, state-of-the-art
Kubernetes Gpu Guide
This guide should help fellow researchers and hobbyists to easily automate and accelerate there deep leaning training with their own Kubernetes GPU cluster.
Stars: ✭ 740 (+848.72%)
Mutual labels:  distributed-systems, gpu
XLearning-GPU
qihoo360 xlearning with GPU support; AI on Hadoop
Stars: ✭ 22 (-71.79%)
Mutual labels:  distributed-systems, gpu
BeatNet
This repository contains the implementation of the AI-based "BeatNet" Joint beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. 2021's state-of-the-art online model - (ISMIR 2021).
Stars: ✭ 56 (-28.21%)
Mutual labels:  state-of-the-art
go2vec
Read and use word2vec vectors in Go
Stars: ✭ 44 (-43.59%)
Mutual labels:  embeddings
hipacc
A domain-specific language and compiler for image processing
Stars: ✭ 72 (-7.69%)
Mutual labels:  gpu
opencv-cuda-docker
Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings
Stars: ✭ 55 (-29.49%)
Mutual labels:  gpu
keras-word-char-embd
Concatenate word and character embeddings in Keras
Stars: ✭ 43 (-44.87%)
Mutual labels:  embeddings
watset-java
An implementation of the Watset clustering algorithm in Java.
Stars: ✭ 24 (-69.23%)
Mutual labels:  embeddings
ReactiveMachine
Author microservices without thinking about faults or servers. Then compile and deploy anywhere.
Stars: ✭ 43 (-44.87%)
Mutual labels:  distributed-systems
tiny-cuda-nn
Lightning fast & tiny C++/CUDA neural network framework
Stars: ✭ 908 (+1064.1%)
Mutual labels:  gpu
LAVT-pytorch
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Stars: ✭ 16 (-79.49%)
Mutual labels:  state-of-the-art
humainary-signals-services-java
Observability Signaling for Distributed Computation
Stars: ✭ 23 (-70.51%)
Mutual labels:  distributed-systems
Subway
Out-of-GPU-Memory Graph Processing with Minimal Data Transfer
Stars: ✭ 30 (-61.54%)
Mutual labels:  gpu
cade
Compass-aligned Distributional Embeddings. Align embeddings from different corpora
Stars: ✭ 29 (-62.82%)
Mutual labels:  embeddings

HETU

Documentation | Examples

Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:

  • Applicability. DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.

  • Efficiency. Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.

  • Flexibility. Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.

  • Scalability. Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark

  • Agility. Automatically ML pipeline: feature engineering, model selection, hyperparameter search.

We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Contribution Guide for more details.

Installation

  1. Clone the repository.

  2. Prepare the environment. We use Anaconda to manage packages. The following command create the conda environment to be used:conda env create -f environment.yml. Please prepare Cuda toolkit and CuDNN in advance.

  3. We use CMake to compile Hetu. Please copy the example configuration for compilation by cp cmake/config.example.cmake cmake/config.cmake. Users can modify the configuration file to enable/disable the compilation of each module. For advanced users (who not using the provided conda environment), the prerequisites for different modules in Hetu is listed in appendix.

# modify paths and configurations in cmake/config.cmake

# generate Makefile
mkdir build && cd build && cmake ..

# compile
# make all
make -j 8
# make hetu, version is specified in cmake/config.cmake
make hetu -j 8
# make allreduce module
make allreduce -j 8
# make ps module
make ps -j 8
# make geometric module
make geometric -j 8
# make hetu-cache module
make hetu_cache -j 8
  1. Prepare environment for running. Edit the hetu.exp file and set the environment path for python and the path for executable mpirun if necessary (for advanced users not using the provided conda environment). Then execute the command source hetu.exp .

Usage

Train logistic regression on gpu:

bash examples/cnn/scripts/hetu_1gpu.sh logreg MNIST

Train a 3-layer mlp on gpu:

bash examples/cnn/scripts/hetu_1gpu.sh mlp CIFAR10

Train a 3-layer cnn with gpu:

bash examples/cnn/scripts/hetu_1gpu.sh cnn_3_layers MNIST

Train a 3-layer mlp with allreduce on 8 gpus (use mpirun):

bash examples/cnn/scripts/hetu_8gpu.sh mlp CIFAR10

Train a 3-layer mlp with PS on 1 server and 2 workers:

# in the script we launch the scheduler and server, and two workers
bash examples/cnn/scripts/hetu_2gpu_ps.sh mlp CIFAR10

More Examples

Please refer to examples directory, which contains CNN, NLP, CTR, GNN training scripts. For distributed training, please refer to CTR and GNN tasks.

Community

License

The entire codebase is under license

Papers

  1. Xupeng Miao, Lingxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang. CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs. TKDE 2021, ICDE 2021
  2. Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui. Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce. SIGMOD 2021
  3. Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, Bin Cui. HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. VLDB 2022, ChinaSys 2021 Winter.
  4. coming soon

Cite

If you use Hetu in a scientific publication, we would appreciate citations to the following paper:

 @inproceedings{vldb/het22,
   title = {HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework},
   author = {Xupeng Miao and
         Hailin Zhang and
         Yining Shi and
             Xiaonan Nie and
             Zhi Yang and
             Yangyu Tao and
             Bin Cui},
   journal = {Proc. {VLDB} Endow.},
   year = {2022},
   url  = {https://doi.org/10.14778/3489496.3489511},
   doi  = {10.14778/3489496.3489511},
 }

Acknowledgements

We learned and borrowed insights from a few open source projects including TinyFlow, autodist, tf.distribute and Angel.

Appendix

The prerequisites for different modules in Hetu is listed as follows:

"*" means you should prepare by yourself, while others support auto-download

Hetu: OpenMP(*), CMake(*)
Hetu (version mkl): MKL 1.6.1
Hetu (version gpu): CUDA 10.1(*), CUDNN 7.5(*)
Hetu (version all): both

Hetu-AllReduce: MPI 3.1, NCCL 2.8(*), this module needs GPU version

Hetu-PS: Protobuf(*), ZeroMQ 4.3.2

Hetu-Geometric: Pybind11(*), Metis(*)

Hetu-Cache: Pybind11(*), this module needs PS module

##################################################################
Tips for preparing the prerequisites

Preparing CUDA, CUDNN, NCCL(NCCl is already in conda environment):
1. download from https://developer.nvidia.com
2. install
3. modify paths in cmake/config.cmake if necessary

Preparing OpenMP:
Your just need to ensure your compiler support openmp.

Preparing CMake, Protobuf, Pybind11, Metis:
Install by anaconda: 
conda install cmake=3.18 libprotobuf pybind11=2.6.0 metis

Preparing OpenMPI (not necessary):
install by anaconda: `conda install -c conda-forge openmpi=4.0.3`
or
1. download from https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz
2. build openmpi by `./configure /path/to/build && make -j8 && make install`
3. modify MPI_HOME to /path/to/build in cmake/config.cmake

Preparing MKL (not necessary):
install by anaconda: `conda install -c conda-forge onednn`
or
1. download from https://github.com/intel/mkl-dnn/archive/v1.6.1.tar.gz
2. build mkl by `mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8` 
3. modify MKL_ROOT to /path/to/root and MKL_BUILD to /path/to/build in cmake/config.cmake 

Preparing ZeroMQ (not necessary):
install by anaconda: `conda install -c anaconda zeromq=4.3.2`
or
1. download from https://github.com/zeromq/libzmq/releases/download/v4.3.2/zeromq-4.3.2.zip
2. build zeromq by 'mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8`
3. modify ZMQ_ROOT to /path/to/build in cmake/config.cmake
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].