All Projects → Paperspace → GPU-Kubernetes-Guide

Paperspace / GPU-Kubernetes-Guide

Licence: other
How to setup a production-grade Kubernetes GPU cluster on Paperspace in 10 minutes for $10

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to GPU-Kubernetes-Guide

Rak8s
Stand up a Raspberry Pi based Kubernetes cluster with Ansible
Stars: ✭ 354 (+941.18%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubectl, kubeadm
rak8s
Stand up a Raspberry Pi based Kubernetes cluster with Ansible
Stars: ✭ 362 (+964.71%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubectl, kubeadm
kainstall-offline
kainstall tools offline file
Stars: ✭ 31 (-8.82%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubeadm
Kubernix
Single dependency Kubernetes clusters for local testing, experimenting and development
Stars: ✭ 545 (+1502.94%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubernetes-deployment
kubernetes-cluster
Vagrant As Automation Script
Stars: ✭ 34 (+0%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubeadm
Eksctl
The official CLI for Amazon EKS
Stars: ✭ 3,550 (+10341.18%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubernetes-deployment
Kubekey
Provides a flexible, rapid and convenient way to install Kubernetes only, both Kubernetes and KubeSphere, and related cloud-native add-ons. It is also an efficient tool to scale and upgrade your cluster.
Stars: ✭ 288 (+747.06%)
Mutual labels:  kubernetes-cluster, kubeadm, kubernetes-deployment
Sonobuoy
Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a set of Kubernetes conformance tests and other plugins in an accessible and non-destructive manner.
Stars: ✭ 2,442 (+7082.35%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubernetes-deployment
Libvirt K8s Provisioner
Automate your k8s installation
Stars: ✭ 106 (+211.76%)
Mutual labels:  kubernetes-setup, kubectl, kubeadm
kubernetes the easy way
Automating Kubernetes the hard way with Vagrant and scripts
Stars: ✭ 22 (-35.29%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, k8s-cluster
Metalk8s
An opinionated Kubernetes distribution with a focus on long-term on-prem deployments
Stars: ✭ 217 (+538.24%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubernetes-deployment
aws-kubernetes
Kubernetes cluster setup in AWS using Terraform and kubeadm
Stars: ✭ 32 (-5.88%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubeadm
terraform-vultr-condor
Kubernetes Deployment Tool for Vultr
Stars: ✭ 60 (+76.47%)
Mutual labels:  kubernetes-cluster, k8s-cluster, kubernetes-deployment
kubeadm-vagrant
Setup Kubernetes Cluster with Kubeadm and Vagrant
Stars: ✭ 49 (+44.12%)
Mutual labels:  kubernetes-cluster, kubectl, kubeadm
Blackbelt Aks Hackfest
Microsoft Intelligent Cloud Blackbelt Team :: Hackfest Repo
Stars: ✭ 209 (+514.71%)
Mutual labels:  kubernetes-setup, kubectl, kubernetes-deployment
Terraform Aws Kubernetes
Terraform module for Kubernetes setup on AWS
Stars: ✭ 159 (+367.65%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubeadm
Kainstall
Use shell scripts to install kubernetes(k8s) high availability clusters and addon components based on kubeadmin with one click.使用shell脚本基于kubeadmin一键安装kubernetes 高可用集群和addon组件。
Stars: ✭ 198 (+482.35%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubeadm
kubernetes-starterkit
A launchpad for developers to learn Kubernetes from scratch and deployment of microservices on a kubernetes cluster.
Stars: ✭ 39 (+14.71%)
Mutual labels:  kubernetes-cluster, kubectl, kubernetes-deployment
aksctl
An easy to use CLI for AKS cluster
Stars: ✭ 46 (+35.29%)
Mutual labels:  kubernetes-cluster, kubernetes-setup, kubectl
Kontainerd
Creating a kubernetes kubeadm cluster using Vagrant machines as nodes and Containerd as a container runtime
Stars: ✭ 16 (-52.94%)
Mutual labels:  kubernetes-cluster, kubeadm

[WIP] How to setup a production-grade Kubernetes GPU cluster on Paperspace in 10 minutes for $10 🌈

note: This guide accompanies an upcoming blog post here and is heavily derived from the fantastic guide here

Table of Contents

Why Kubernetes

If you are a startup or individual interested in getting a production-grade ML/Datascience pipeline going, Kubernetes can be extremely valuable. It is without a doubt one of the best tools for orchestrating complex deployements and managing specific hardware interdependencies.

Unlike the web development world, the ML/AI/Datascience community does not yet have entirely established patterns and best practices. We at Paperspace believe that Kubernetes could play a big part in helping companies get up and running quickly and with the best performance possible.

Step 1: Create a Paperspace Account

Head over to Paperspace to create your account (it only takes two seconds). You will need to confirm your email and then log back in. Once in there you will need to add a credit card on file. This tutorial should only cost about $5 to get going (plus any additional usage after).

Step 2: Create a private network

You will need a private network for this tutorial which is currently only available to our "teams" accounts. Shoot an email to support [@] paperspace [dot] com to request a team account (there is no charge).

Once confirmed, head over to the network page to create a private network in your regions (note: Private networks are region-specific so you will need to keep everything in the same region).

You might need to refresh the page if the network doesn't show up after about 20 seconds.

screenshot_2

Step 3: Prepare master node (CPU)

OK, so now you have a Paperspace team account and a private network. Now go to create your first machine on this private network. On the machine create page you can create a Paperspace C3 instance running Ubuntu 18.04; make sure it is on your private network.

screenshot_1

Once this machine is created (i.e. it is no longer "provisioning" and has transitioned to the "ready" state) you will be emailed a temporary password.

screenshot_3

Go to your Paperspace Console and open the machine. It will ask you for your password. Type CTRL+SHIFT+V on windows to paste the password. You can change the password if you would like by typing passwd and then confirming a new password.

screenshot_5

You are now in the web terminal for your master node. First, disable the existing UFW firewall by typing the following:

sudo ufw disable

(note: We do this for testing only, you will want to reenable it later for security. That said, on Paperspace before you add a public IP, your machines are fully isolated)

Step 4: Install Kubernetes on the master node (CPU)

Now, download and execute the initialization script which will set up the Kubernetes master. It is very helpful to go through this short (<40 LOC) script to see what it is doing. At a high level, it downloads Kubernetes, Docker, and a few required packages, installs them, and initiates the Kubernetes process using the kubeadm tool.

Note: Because we are building our cluster on an isolated private network we can safely assume that all nodes can talk to one another, but are not yet publicly addressable

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/scripts/init-master.sh
chmod +x init-master.sh
sudo ./init-master.sh

This will return a join command in the format kubeadm join xxxx:6443 --token u328wq.xxxxx --discovery-token-ca-cert-hash sha256:xxxxx. Copy those parameter values for joining new nodes. You can regenerate a new join command with kubeadm token create --print-join-command.

Step 5: Prepare GPU worker node

Yes! We have a Kubernetes master node up and running. The next step is to add a GPU-backed worker node and join it to the network. Luckily we have a script for this too (but again, it is a really good practice to go through the script to see what it is doing).

First, create a Paperspace GPU+ instance running Ubuntu 16.04 and make sure it is on your private network. We could use the ML-in-a-box template for this, but really we only need the NVIDIA driver and CUDA installed which our script will download and install for us. This worker node will be used to run GPU-backed Docker containers that are assigned to it by the Kubernetes master node.

Execute the initialization script with the correct parameter values that you copied from above. (ProTip: For this type of work it is best to open two browser tabs and have two Paperspace terminals running one for the master and one for the worker).

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/scripts/init-worker.sh
chmod +x init-worker.sh
sudo ./init-worker.sh <ip:port> <token> <ca cert hash>

Once it has joined, you will need to reboot the machine. The nvidia driver is installed but it needs a reboot to work. If you do not do this then the node will not appear to be GPU-enabled to our Kubernetes cluster.

Step 6: Deploy a Jupyter-notebook with GPU-support

Awesome! We now have a worker and master node joined together to form a bare-bones Kubernetes cluster. You can confirm this on the master node with the following command:

kubectl get nodes

You should see the hostname of your GPU+ paperspace machine (the worker node) on this list.

In Kubernetes language, a deployment is a yaml file which defines an application that you would like to run on your Kubernetes cluster. The deployment defines which Docker container is used and what its features/specs are. Additionally, the yaml file contains a service description which is responsible for assigning an addressable port to the deployment. We are using the Kubernetes NodePort service type which will choose a port to assign to the container and make it available on all worker nodes. For now, all you need to know is that Kubernetes will find our GPU-backed worker node and send the Jupyter notebook to it.

Download the yaml file from this github repo (you could also copy/paste it using Vim, Nano, Emacs, etc)

wget https://raw.githubusercontent.com/Paperspace/GPU-Kubernetes-Guide/master/deployments/tf-jupyter.yaml

Have Kubernetes deploy it:

kubectl apply -f tf-jupyter.yaml

Step 7: Assign a public IP to Worker node

Ok, so this is not a best practice, but it will quickly let us see if everything is working. We will apply a public IP to our worker node. Because this has all been done in our private network nothing is publicly accessible from the outside world.

screenshot_7

Step 8: All done! Woooo!

That's it. You have done what very few people have accomplished -- a GPU-backed Kubernetes cluster in just a few minutes. Go to your new public IP address and port and you should now have a Jupyter notebook running!

screen shot 2017-07-13 at 12 22 56 pm

Now, in Part 2 (coming soon) we will cover adding storage, and building out a real ML pipeline.

Next Steps (Coming Soon)

Acknowledgements

Most of the heavy lifting here came from the guide here: https://github.com/Langhalsdino/Kubernetes-GPU-Guide which was enormously helpful! A huge shoutout to Langhalsdino.

Additional improvements were suggested by azlyth specifically getting some of the networking to work on Paperspace.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].