All Projects → RadeonOpenCompute → k8s-device-plugin

RadeonOpenCompute / k8s-device-plugin

Licence: Apache-2.0 license
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Programming Languages

go
31211 projects - #10 most used programming language
Dockerfile
14818 projects
Smarty
1635 projects

Projects that are alternatives of or similar to k8s-device-plugin

Nnvm
No description or website provided.
Stars: ✭ 1,639 (+905.52%)
Mutual labels:  rocm
Tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Stars: ✭ 7,494 (+4497.55%)
Mutual labels:  rocm
Numba
NumPy aware dynamic Python compiler using LLVM
Stars: ✭ 7,090 (+4249.69%)
Mutual labels:  rocm
Cupy
NumPy & SciPy for GPU
Stars: ✭ 5,625 (+3350.92%)
Mutual labels:  rocm
ROCArrays.jl
Parallel on the ROCks
Stars: ✭ 17 (-89.57%)
Mutual labels:  rocm
Hetero-Mark
A Benchmark Suite for Heterogeneous System Computation
Stars: ✭ 41 (-74.85%)
Mutual labels:  rocm
rocPRIM
ROCm Parallel Primitives
Stars: ✭ 95 (-41.72%)
Mutual labels:  rocm
rocRAND
RAND library for HIP programming language
Stars: ✭ 68 (-58.28%)
Mutual labels:  rocm
RET
ROCm Machine Learning and HPC Stack installer
Stars: ✭ 28 (-82.82%)
Mutual labels:  rocm
SIRIUS
Domain specific library for electronic structure calculations
Stars: ✭ 87 (-46.63%)
Mutual labels:  rocm
amdovx-modules
AMD OpenVX modules: such as, neural network inference, 360 video stitching, etc.
Stars: ✭ 106 (-34.97%)
Mutual labels:  rocm
realcaffe2
The repo is obsolete. Use at your own risk.
Stars: ✭ 12 (-92.64%)
Mutual labels:  rocm
k8s-dt-node-labeller
Kubernetes controller for labelling a node with devicetree properties
Stars: ✭ 17 (-89.57%)
Mutual labels:  kubernetes-device-plugins

AMD GPU device plugin for Kubernetes

Go Report Card

Introduction

This is a Kubernetes device plugin implementation that enables the registration of AMD GPU in a container cluster for compute workload. With the appropriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD GPU.

More information about RadeonOpenCompute (ROCm)

Prerequisites

Limitations

  • This plugin targets Kubernetes v1.18+.

Deployment

The device plugin needs to be run on all the nodes that are equipped with AMD GPU. The simplest way of doing so is to create a Kubernetes DaemonSet, which run a copy of a pod on all (or some) Nodes in the cluster. We have a pre-built Docker image on DockerHub that you can use for with your DaemonSet. This repository also have a pre-defined yaml file named k8s-ds-amdgpu-dp.yaml. You can create a DaemonSet in your Kubernetes cluster by running this command:

$ kubectl create -f k8s-ds-amdgpu-dp.yaml

or directly pull from the web using

kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-dp.yaml

If you want to enable the experimental device health check, please use k8s-ds-amdgpu-dp-health.yaml after --allow-privileged=true is set for kube-apiserver and kublet.

Example workload

You can restrict work to a node with GPU by adding resources.limits to the pod definition. An example pod definition is provided in example/pod/alexnet-gpu.yaml. This pod runs the timing benchmark for AlexNet on AMD GPU and then go to sleep. You can create the pod by running:

$ kubectl create -f alexnet-gpu.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/example/pod/alexnet-gpu.yaml

and then check the pod status by running

$ kubectl describe pods

After the pod is created and running, you can see the benchmark result by running:

$ kubectl logs alexnet-tf-gpu-pod alexnet-tf-gpu-container

For comparison, an example pod definition of running the same benchmark with CPU is provided in example/pod/alexnet-cpu.yaml.

Labelling node with additional GPU properties

Please see AMD GPU Kubernetes Node Labeller for details. An example configuration is in k8s-ds-amdgpu-labeller.yaml:

$ kubectl create -f k8s-ds-amdgpu-labeller.yaml

or

$ kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/master/k8s-ds-amdgpu-labeller.yaml

Notes

  • This plugin uses go modules for dependencies management
  • Please consult the Dockerfile on how to build and use this plugin independent of a docker image

TODOs

  • Add proper GPU health check (health check without /dev/kfd access.)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].