All Projects → spotify → Terraform Gke Kubeflow Cluster

spotify / Terraform Gke Kubeflow Cluster

Licence: apache-2.0
Terraform module for creating GKE clusters to run Kubeflow

Labels

Projects that are alternatives of or similar to Terraform Gke Kubeflow Cluster

Terraform Aws Kubernetes
Terraform module for Kubernetes setup on AWS
Stars: ✭ 159 (-10.17%)
Mutual labels:  hcl
Aws Incident Response
Stars: ✭ 167 (-5.65%)
Mutual labels:  hcl
Tfk8s
A tool for converting Kubernetes YAML manifests to Terraform HCL
Stars: ✭ 167 (-5.65%)
Mutual labels:  hcl
Dcos Kubernetes Quickstart
Quickstart guide for Kubernetes on DC/OS
Stars: ✭ 161 (-9.04%)
Mutual labels:  hcl
Terraform Aws Openshift
Create infrastructure with Terraform and AWS, install OpenShift. Party!
Stars: ✭ 165 (-6.78%)
Mutual labels:  hcl
Terraform Aws Cloudtrail Cloudwatch Alarms
Terraform module for creating alarms for tracking important changes and occurrences from cloudtrail.
Stars: ✭ 170 (-3.95%)
Mutual labels:  hcl
Terraform Google Nat Gateway
Modular NAT Gateway on Google Compute Engine for Terraform.
Stars: ✭ 155 (-12.43%)
Mutual labels:  hcl
K8s Scw Baremetal
Kubernetes installer for Scaleway bare-metal AMD64 and ARMv7
Stars: ✭ 176 (-0.56%)
Mutual labels:  hcl
Terraform Aws Autoscaling
Terraform module which creates Auto Scaling resources on AWS
Stars: ✭ 166 (-6.21%)
Mutual labels:  hcl
Terraform Amazon Ecs
Terraform files for deploying and running Amazon ECS (+ Private Docker Registry)
Stars: ✭ 171 (-3.39%)
Mutual labels:  hcl
Terraform Kubernetes Installer
Terraform Installer for Kubernetes on Oracle Cloud Infrastructure
Stars: ✭ 162 (-8.47%)
Mutual labels:  hcl
Terraform Aws Rds Aurora
Terraform module which creates RDS Aurora resources on AWS
Stars: ✭ 165 (-6.78%)
Mutual labels:  hcl
Terraform Aws Components
Opinionated, self-contained Terraform root modules that each solve one, specific problem
Stars: ✭ 168 (-5.08%)
Mutual labels:  hcl
Zeit Now
GitHub Action for interacting with Zeit Now
Stars: ✭ 160 (-9.6%)
Mutual labels:  hcl
Heroku
GitHub Action for interacting with Heroku
Stars: ✭ 172 (-2.82%)
Mutual labels:  hcl
Apn Blog
APN Blog article code and configurations.
Stars: ✭ 156 (-11.86%)
Mutual labels:  hcl
C1m
Nomad, Terraform, and Packer configurations for the Million Container Challenge (C1M)
Stars: ✭ 167 (-5.65%)
Mutual labels:  hcl
Stack
A set of Terraform modules for configuring production infrastructure with AWS
Stars: ✭ 2,080 (+1075.14%)
Mutual labels:  hcl
Terraform Aws Foundation
Establish a solid Foundation on AWS with these modules for Terraform
Stars: ✭ 173 (-2.26%)
Mutual labels:  hcl
Getting Started Terraform
Stars: ✭ 171 (-3.39%)
Mutual labels:  hcl

terraform-gke-kubeflow-cluster

lifecycle License

A Terraform module for creating a GKE cluster to run Kubeflow on.

This module creates a GKE cluster similiar to how the kfctl tool does, with a few changes:

  • adds a Cloud SQL instance to use for the metadata store/databases
  • creates a GCE Persistent Disk to use for the artifact store

This module was originally created by the ML Infrastructure team at Spotify to create and manage long-lived GKE clusters for many Kubeflow-using teams at Spotify to use, whereas the kfctl tool and documentation around creating a cluster for Kubeflow tends to assume that individual clusters are quickly spun-up and torn-down by engineers using Kubeflow. For more details on how Spotify's centralized Kubeflow platform, see this talk from Kubecon North America 2019.

Usage

To use this within Terraform, add a module block like:

module "kubeflow-cluster" {
  source  = "spotify/kubeflow-cluster/gke"
  version = "0.0.1"
}

For more details, see https://registry.terraform.io/modules/spotify/kubeflow-cluster/gke/0.0.1

Module details

The terraform-gke-kubeflow-cluster module creates the following resources:

  • a GKE cluster (attached to a Shared VPC if the relevant parameters for networks/subnetworks are set)
  • a Cloud SQL instance to use for the metadata store/databases
  • a GCE Persistent Disk to use for Argo's artifact store
  • GCP service accounts for Kubeflow to use (distinct accounts per cluster):
    • an "admin" service account (used for IAP - which is not included in this module)
    • the "user" service account for Kubeflow pipelines to use
    • the VM service account used by the GKE cluster/nodes itself
  • IAM bindings for the above service accounts
  • Kubernetes secrets for:
    • cloudsql-instance-credentials for the cloudsql-proxy connected to the metadata SQL instance
    • admin-gcp-sa containing the "admin" GCP service account for Kubeflow
    • user-gcp-sa containing the "user" GCP service account for Kubeflow

Each "instantiation" of the module creates a new set of all of these resources

  • the intent of the module is to automate the entire setup of all of the GCP resources needed to run a Kubeflow cluster.

This repo does not currently actually install the Kubeflow system components on the cluster - use kfctl or another tool for that.

Local development

Run the following commands from the root of the project:

  1. brew install tfenv -- install tfenv
  2. tfenv install -- install the version of Terraform specified in .terraform-version in source control
  3. terraform init -- setup terraform providers

Note on master and node version values

The expected behavior of fuzzy versions for min_master_version and node_version is undocumented (Github issue). From empirical evidence, the behavior so far is that the most recent version that matches the fuzzy version is used. For example, node_version = "1.11" results in GKE nodes running 1.11.7-gke.6 if that's the most recent version.

Releasing new versions of the module

See https://www.terraform.io/docs/registry/modules/publish.html#releasing-new-versions

A webhook has been automatically added to the repo, and a new "release" will be visible in the Terraform Registry whenever a new tag is pushed that looks like a semantic version (e.g. "v1.2.3"). So to cut a release, simply tag a commit and make sure to push the tag to Github with git push --tags.

Code of Conduct

This project adheres to the Open Code of Conduct. By participating, you are expected to honor this code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].