All Projects → snuspl → Nimble

snuspl / Nimble

Licence: other

Projects that are alternatives of or similar to Nimble

Mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
Stars: ✭ 15,338 (+12576.03%)
Mutual labels:  framework, inference
studio-lab-examples
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Stars: ✭ 319 (+163.64%)
Mutual labels:  training, inference
Dawn Bench Entries
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Stars: ✭ 254 (+109.92%)
Mutual labels:  training, inference
Bmw Labeltool Lite
This repository provides you with a easy to use labeling tool for State-of-the-art Deep Learning training purposes.
Stars: ✭ 145 (+19.83%)
Mutual labels:  training, inference
Neuraxle
A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
Stars: ✭ 377 (+211.57%)
Mutual labels:  parallel, framework
sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Stars: ✭ 93 (-23.14%)
Mutual labels:  training, inference
Avalanche
Avalanche: a End-to-End Library for Continual Learning.
Stars: ✭ 151 (+24.79%)
Mutual labels:  framework, training
Amazon Sagemaker Examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Stars: ✭ 6,346 (+5144.63%)
Mutual labels:  training, inference
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
Stars: ✭ 567 (+368.6%)
Mutual labels:  training, inference
chainer-fcis
[This project has moved to ChainerCV] Chainer Implementation of Fully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 45 (-62.81%)
Mutual labels:  training, inference
Torchlayers
Shape and dimension inference (Keras-like) for PyTorch layers and neural networks
Stars: ✭ 527 (+335.54%)
Mutual labels:  framework, inference
Reactinterface
This is the repository for my course, Building a Web Interface with React.js on LinkedIn Learning and Lynda.com.
Stars: ✭ 113 (-6.61%)
Mutual labels:  framework, training
Kyua
Testing framework for infrastructure software
Stars: ✭ 117 (-3.31%)
Mutual labels:  framework
Oxygen
一个轻量级Java框架,包含ioc、aop、config、cache、job、Jdbc、web等
Stars: ✭ 119 (-1.65%)
Mutual labels:  framework
Android Readthefuckingsourcecode
😜 记录日常的开发技巧,开发中遇到的技术重点、难点,各个知识点的总结,优质面试题等等。持续更新...
Stars: ✭ 1,665 (+1276.03%)
Mutual labels:  framework
Flexlib
FlexLib是一个基于flexbox模型,使用xml文件进行界面布局的框架,融合了web快速布局的能力,让iOS界面开发像写网页一样简单快速
Stars: ✭ 1,569 (+1196.69%)
Mutual labels:  framework
Puresharp
Puresharp is a Framework that provides the essential APIs (AOP, IOC, etc...) to productively build high quality (.NET 4.5.2+ & .NET Core 2.1+) applications through reliability, scalability and performance without no compromise
Stars: ✭ 120 (-0.83%)
Mutual labels:  framework
Narration
The Narration PHP Framework - Empowering everyone to build reliable and loosely coupled web apps.
Stars: ✭ 119 (-1.65%)
Mutual labels:  framework
Docker Vulnerable Dvwa
Damn Vulnerable Web Application Docker container
Stars: ✭ 117 (-3.31%)
Mutual labels:  training
Netclient Ios
Versatile HTTP Networking in Swift
Stars: ✭ 117 (-3.31%)
Mutual labels:  framework

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead. Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model. The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs. Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.

  • Speedup in Inference (ImageNet models)

Inference performance comparison on an NVIDIA V100 GPU.
  • Speedup in Training (CIFAR-10 models)
Batch 32 Batch 64 Batch 128

Training performance comparison on an NVIDIA V100 GPU.

Version

This version of Nimble is built on top of PyTorch v1.7.1 with CUDA 11.0. If you want to see the old version of Nimble we used for our experiments in the paper, please checkout to main_pytorch_v1.4.1.

Install Nimble

Please refer to instructions to install Nimble from source.

Use Nimble

Nimble supports both inference and training of neural networks.

Model Inference

import torch
import torchvision

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50()
model = model.cuda()
model.eval()

# Prepare a dummy input
input_shape = [1, 3, 224, 224]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=False)

# Execute the object
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

Model Training

import torch
import torchvision

BATCH = 32

# Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50(num_classes=10)
model = model.cuda()
model.train()

# Define a loss function and an optimizer
loss_fn = torch.nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Prepare a dummy input
input_shape = [BATCH, 3, 32, 32]
dummy_input = torch.randn(*input_shape).cuda()

# Create a Nimble object
nimble_model = torch.cuda.Nimble(model)
nimble_model.prepare(dummy_input, training=True)

# Execute the forward pass
rand_input = torch.rand(*input_shape).cuda()
output = nimble_model(rand_input)

# Compute loss
label = torch.zeros(BATCH, dtype=torch.long).cuda()
loss = loss_fn(output, label)

# Execute the backward pass
loss.backward()

# Perform an optimization step
optimizer.step()

Reproduce Evaluation Results

Please refer to evaluation instructions to reproduce the evaluation results.

Publication

Woosuk Kwon*, Gyeong-In Yu*, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.

Citation

@inproceedings{kwon2020nimble,
  title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning},
  author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon},
  booktitle={NeurIPS},
  year={2020}
}

Troubleshooting

Create an issue for questions and bug reports.

Contribution

We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to [email protected].

License

BSD 3-clause license

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].