All Projects → awslabs → aws-virtual-gpu-device-plugin

awslabs / aws-virtual-gpu-device-plugin

Licence: Apache-2.0 license
AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

Programming Languages

Jupyter Notebook
11667 projects
go
31211 projects - #10 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to aws-virtual-gpu-device-plugin

ffmpegtoolkit
CentOS 8.x 64bit ffmpeg auto installer scripts
Stars: ✭ 62 (-46.09%)
Mutual labels:  nvidia
qe-gpu
GPU-accelerated Quantum ESPRESSO using CUDA FORTRAN
Stars: ✭ 50 (-56.52%)
Mutual labels:  nvidia
darknet
Darknet on OpenCL Convolutional Neural Networks on OpenCL on Intel & NVidia & AMD & Mali GPUs for macOS & GNU/Linux
Stars: ✭ 160 (+39.13%)
Mutual labels:  nvidia
cloudgamestream
A Powershell one-click solution to enable NVIDIA GeForce Experience GameStream on a cloud machine with a GRID supporting GPU.
Stars: ✭ 99 (-13.91%)
Mutual labels:  nvidia
clara-dicom-adapter
DICOM Adapter is a component of the Clara Deploy SDK which facilitates integration with DICOM compliant systems, enables ingestion of imaging data, helps triggering of jobs with configurable rules and offers pushing the output of jobs to PACS systems.
Stars: ✭ 31 (-73.04%)
Mutual labels:  nvidia
noncerpro-nimiq-cuda
Nimiq CUDA miner
Stars: ✭ 23 (-80%)
Mutual labels:  nvidia
tensorflow-builds
Tensorflow binaries and Docker images compiled with GPU support and CPU optimizations.
Stars: ✭ 15 (-86.96%)
Mutual labels:  nvidia
vdpau-va-driver-vp9
Experimental VP9 codec support for vdpau-va-driver (NVIDIA VDPAU-VAAPI wrapper) and chromium-vaapi
Stars: ✭ 68 (-40.87%)
Mutual labels:  nvidia
ros jetson stats
🐢 The ROS jetson-stats wrapper. The status of your NVIDIA jetson in diagnostic messages
Stars: ✭ 55 (-52.17%)
Mutual labels:  nvidia
GapFlyt
GapFlyt: Active Vision Based Minimalist Structure-less Gap Detection For Quadrotor Flight
Stars: ✭ 30 (-73.91%)
Mutual labels:  nvidia
SOLIDWORKS-for-Linux
This is a project, where I give you a way to use SOLIDWORKS on Linux!
Stars: ✭ 122 (+6.09%)
Mutual labels:  nvidia
gblastn
G-BLASTN is a GPU-accelerated nucleotide alignment tool based on the widely used NCBI-BLAST.
Stars: ✭ 52 (-54.78%)
Mutual labels:  nvidia
Nvidia-Intel
Setup Nvidia & Intel services
Stars: ✭ 21 (-81.74%)
Mutual labels:  nvidia
dxvk-nvapi
Alternative NVAPI implementation on top of DXVK.
Stars: ✭ 133 (+15.65%)
Mutual labels:  nvidia
play with tensorrt
Sample projects for TensorRT in C++
Stars: ✭ 39 (-66.09%)
Mutual labels:  nvidia
monkey-master
A deno tool for buying hot GPUs in JD, such as RTX3080 rx6800, a thick-skinned orange!
Stars: ✭ 180 (+56.52%)
Mutual labels:  nvidia
Autodesk-Fusion-360-for-Linux
This is a project, where I give you a way to use Autodesk Fusion 360 on Linux!
Stars: ✭ 810 (+604.35%)
Mutual labels:  nvidia
linux nvidia jetson
Allied Vision CSI-2 camera driver for NVIDIA Jetson Systems. Currently supporting Nano, TX2, AGX Xavier, and Xavier NX. Support for TX2 NX coming soon.
Stars: ✭ 68 (-40.87%)
Mutual labels:  nvidia
gpufetch
Simple yet fancy GPU architecture fetching tool
Stars: ✭ 66 (-42.61%)
Mutual labels:  nvidia
BatchAIHorovodBenchmark
Benchmarking Horovod and TF on Batch AI
Stars: ✭ 25 (-78.26%)
Mutual labels:  nvidia

Virtual GPU device plugin for Kubernetes

The virtual device plugin for Kubernetes is a Daemonset that allows you to automatically:

  • Expose arbitrary number of virtual GPUs on GPU nodes of your cluster.
  • Run ML serving containers backed by Accelerator with low latency and low cost in your Kubernetes cluster.

This repository contains AWS virtual GPU implementation of the Kubernetes device plugin.

Prerequisites

The list of prerequisites for running the virtual device plugin is described below:

  • NVIDIA drivers ~= 361.93
  • nvidia-docker version > 2.0 (see how to install and it's prerequisites)
  • docker configured with nvidia as the default runtime.
  • Kubernetes version >= 1.10

Limitations

  • This solution is build on top of Volta Multi-Process Service(MPS). You can only use it on instances types with Tesla-V100 or newer. (Only Amazon EC2 P3 Instances and Amazon EC2 G4 Instances now)
  • Virtual GPU device plugin by default set GPU compute mode to EXCLUSIVE_PROCESS which means GPU is assigned to MPS process, individual process threads can submit work to GPU concurrently via MPS server. This GPU can not be used for other purpose.
  • Virtual GPU device plugin only on single physical GPU instance like P3.2xlarge if you request k8s.amazonaws.com/vgpu more than 1 in the workloads.
  • Virtual GPU device plugin can not work with Nvidia device plugin together. You can label nodes and use selector to install Virtual GPU device plugin.

High Level Design

device-plugin

Quick Start

Label GPU node groups

kubectl label node <your_k8s_node_name> k8s.amazonaws.com/accelerator=vgpu

Enabling virtual GPU Support in Kubernetes

Update node selector label in the manifest file to match with labels of your GPU node group, then apply it to Kubernetes.

$ kubectl create -f https://raw.githubusercontent.com/awslabs/aws-virtual-gpu-device-plugin/v0.1.1/manifests/device-plugin.yml

Running GPU Jobs

Virtual NVIDIA GPUs can now be consumed via container level resource requirements using the resource name k8s.amazonaws.com/vgpu:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: resnet-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: resnet-server
  template:
    metadata:
      labels:
        app: resnet-server
    spec:
      # hostIPC is required for MPS communication
      hostIPC: true
      containers:
      - name: resnet-container
        image: seedjeffwan/tensorflow-serving-gpu:resnet
        args:
        # Make sure you set limit based on the vGPU account to avoid tf-serving process occupy all the gpu memory
        - --per_process_gpu_memory_fraction=0.2
        env:
        - name: MODEL_NAME
          value: resnet
        ports:
        - containerPort: 8501
        # Use virtual gpu resource here
        resources:
          limits:
            k8s.amazonaws.com/vgpu: 1
        volumeMounts:
        - name: nvidia-mps
          mountPath: /tmp/nvidia-mps
      volumes:
      - name: nvidia-mps
        hostPath:
          path: /tmp/nvidia-mps

WARNING: if you don't request GPUs when using the device plugin all the GPUs on the machine will be exposed inside your container.

Check the full example here

Development

Please check Development for more details.

Credits

The project idea comes from @RenaudWasTaken comment in kubernetes/kubernetes#52757 and Alibaba’s solution from @cheyang GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters.

Reference

AWS:

Community:

License

This project is licensed under the Apache-2.0 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].