All Categories → No Category → distributed-training

Top 18 distributed-training open source projects

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

✭ 17,232

python C++Cuda CMake shell c deep-learning machine-learning neural-network scalability efficiency paddlepaddle distributed-training

Pytorch Image Models

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

Adanet

Fast and flexible AutoML with learning guarantees.

✭ 3,340

python Jupyter Notebook deep-learning machine-learning tensorflow gpu automl neural-architecture-search ensemble learning-theory distributed-training tpu

Byteps

A high performance and generic framework for distributed DNN training

✭ 3,028

python C++deep-learning machine-learning pytorch tensorflow keras mxnet distributed-training

Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

✭ 78

python C++Cuda c CMake shell distributed-systems data-science machine-learning deep-neural-networks deep-learning gpu autograd embeddings artificial-intelligence high-dimensional distributed-training state-of-the-art

HyperGBM

A full pipeline AutoML tool for tabular data

✭ 172

python Dockerfile tabular-data xgboost semi-supervised-learning gbm lightgbm ensemble-learning dask preprocessing automl distributed-training datacleaning catboost pseudo-labeling fullpipeline adversarial-validation automl-pipeline-selection

pytorch-model-parallel

A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch

✭ 74

python pytorch half-precision re-id distributed-training model-parallel

dynamic-training-with-apache-mxnet-on-aws

Dynamic training with Apache MXNet reduces cost and time for training deep neural networks by leveraging AWS cloud elasticity and scale. The system reduces training cost and time by dynamically updating the training cluster size during training, with minimal impact on model training accuracy.

✭ 51

python C++Jupyter Notebook perl scala Cuda aws machine-learning deep-learning neural-network mxnet distributed-training

HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

✭ 228

python machine-learning games reinforcement-learning deep-learning pytorch policy-gradient distributed-training

torchx

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

✭ 165

python kubernetes components machine-learning airflow deep-learning slurm pipelines pytorch ray aws-batch distributed-training

DistributedDeepLearning

Tutorials on running distributed deep learning on Batch AI

✭ 23

shell Jupyter Notebook python deep-learning azure nvidia convolutional-neural-networks nvidia-docker distributed-training batch-ai

PLSC

Paddle Large Scale Classification Tools，supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.

✭ 113

python shell face-recognition resnet vit data-parallel large-scale paddle paddlepaddle low-memory distributed-training arcface cosface model-parallel deit hight-speed partial-fc facevit

sagemaker-xgboost-container

This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.

✭ 93

python java training aws machine-learning inference xgboost gbm distributed-training sagemaker

basecls

A codebase & model zoo for pretrained backbone based on MegEngine.

✭ 29

python Makefile classification pretrained-models imagenet-classifier distributed-training megengine

libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

✭ 284