All Projects → ilya-lesikov → gke-demo

ilya-lesikov / gke-demo

Licence: MIT license
Demonstration of complete, fully-featured CI/CD and cloud automation for microservices, done with GCP/GKE

Programming Languages

HCL
1544 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to gke-demo

wordpress-skeleton
A base repository structure for rtCamp's WordPress sites, pre-configured to use Github Actions
Stars: ✭ 32 (-31.91%)
Mutual labels:  ci, cd, ci-cd
gitops-playground
Reproducible infrastructure to showcase GitOps workflows and evaluate different GitOps Operators on Kubernetes
Stars: ✭ 77 (+63.83%)
Mutual labels:  gke, gke-cluster, argocd
www.go.cd
Github pages repo
Stars: ✭ 39 (-17.02%)
Mutual labels:  ci, cd, ci-cd
Gocd
Main repository for GoCD - Continuous Delivery server
Stars: ✭ 6,314 (+13334.04%)
Mutual labels:  ci, cd, ci-cd
k8s-digester
Add digests to container and init container images in Kubernetes pod and pod template specs. Use either as a mutating admission webhook, or as a client-side KRM function with kpt or kustomize.
Stars: ✭ 65 (+38.3%)
Mutual labels:  gcp, gke, kustomize
Nevergreen
🐤 A build monitor with attitude
Stars: ✭ 170 (+261.7%)
Mutual labels:  ci, cd, ci-cd
Docker Builder
Docker builder builds Docker images from a friendly config file.
Stars: ✭ 81 (+72.34%)
Mutual labels:  ci, cd, ci-cd
Pypyr
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
Stars: ✭ 173 (+268.09%)
Mutual labels:  ci, cd, ci-cd
setup-graalvm
No description or website provided.
Stars: ✭ 63 (+34.04%)
Mutual labels:  ci, cd
build-a-platform-with-krm
Build a platform with the Kubernetes resource model!
Stars: ✭ 55 (+17.02%)
Mutual labels:  gke, kustomize
Infrastructure
Templates and assets used to launch and manage many aspects of PRX's applications and services
Stars: ✭ 40 (-14.89%)
Mutual labels:  ci, cd
DevSecOps
Ultimate DevSecOps library
Stars: ✭ 4,450 (+9368.09%)
Mutual labels:  gcp, ci-cd
Webhookd
A very simple webhook server launching shell scripts.
Stars: ✭ 250 (+431.91%)
Mutual labels:  ci, cd
Networking-and-Kubernetes
This is the code repo for Networking and Kubernetes: A Layered Approach. https://learning.oreilly.com/library/view/networking-and-kubernetes/9781492081647/
Stars: ✭ 103 (+119.15%)
Mutual labels:  gcp, gke
Rok8s Scripts
Opinionated scripts for managing application deployment lifecycle in Kubernetes
Stars: ✭ 248 (+427.66%)
Mutual labels:  ci, cd
applicationset-progressive-sync
Progressive sync controller for Argo ApplicationSet
Stars: ✭ 99 (+110.64%)
Mutual labels:  cd, argocd
gke-anthos-holistic-demo
This repository guides you through deploying a private GKE cluster and provides a base platform for hands-on exploration of several GKE related topics which leverage or integrate with that infrastructure. After completing the exercises in all topic areas, you will have a deeper understanding of several core components of GKE and GCP as configure…
Stars: ✭ 55 (+17.02%)
Mutual labels:  gcp, gke
Opendevops
CODO是一款为用户提供企业多混合云、一站式DevOps、自动化运维、完全开源的云管理平台、自动化运维平台
Stars: ✭ 2,990 (+6261.7%)
Mutual labels:  ci, cd
azure-flutter-tasks
Easily build and deploy with latest Flutter build tasks for Azure DevOps Pipelines Tasks
Stars: ✭ 66 (+40.43%)
Mutual labels:  cd, ci-cd
Press
A continuous developement environment for Powershell Modules either via local development or leveraging GitHub and Github Actions
Stars: ✭ 21 (-55.32%)
Mutual labels:  ci, cd

Demonstration of complete, fully-featured CI/CD and cloud automation for microservices, done with GCP/GKE

Features

  • Multistage deployments (staging, prod)
  • Canary deployments
  • Horizontal pod/instance autoscaling
  • Rollbacks, self-healing
  • Distributed tracing, monitoring, logging, profiling, debugging

Setup/deployment is heavily automated so it will be easy for you to deploy it by yourself using GCP account with Free Trial

Contents

  1. Features
  2. Software
  3. How it works
  4. Quick start
  5. Looking around
  6. Cleanup
  7. Implementing this in the real-world
  8. Known issues
  9. Halp?

Software

What For
Terraform, Terragrunt Cloud automation
Kubernetes (GKE), Kustomize Container orchestration
Google Cloud Build CI
ArgoCD, Argo Rollouts CD
Google Stackdriver Monitoring, logging, tracing,
profiling, debugging
Cloud KMS, Container Registry,
Storage and other GCP goodies

Also we are using 10 microservices from Google with built-in instrumentation for Stackdriver

How it works (simplified)

Diagram

Quick start

  1. You need GCP account with Free Trial activated

  2. You need GitHub account

  3. Fork this repo (we can't setup GCB triggers for repositories you don't own)

  4. You need Docker installed (any OS)

  5. Run and attach to the docker container:

    # Change this to the owner of the forked "gke-demo" repo, don't leave it like this
    GITHUB_USERNAME=ilya-lesikov
    
    # Run container with all the tooling we need:
    # NOTE: you can change "TF_VAR_project_id" in this command to point to the existing GCP project
    docker run -d --name gke-demo \
      -e TF_VAR_project_id=gke-demo-$GITHUB_USERNAME \
      -e TF_VAR_github_demo_owner=$GITHUB_USERNAME \
      ilyalesikov/gke-demo
    
    # Attach to the container
    docker exec -it gke-demo bash
  6. Prepare for cloud provisioning (this is run from the inside of the container):

    # Clone the repo you forked
    git clone --recursive https://github.com/${TF_VAR_github_demo_owner}/gke-demo
    
    # Run this and follow the instructions on your screen.
    # This will authorize us to access your GCP account and the "gke-demo" repo you forked.
    ./gke-demo/scripts/prepare.sh && source /root/.bashrc
  7. Provision our cloud infrastructure with Terraform/Terragrunt:

    On any transient errors (e.g. SSL/TLS errors or remote server closed connection) just rerun the terragrunt command. Terragrunt handles most of these automatically, but Terraform sucks so much it'll need 10 wrappers to be truly reliable

    cd gke-demo/terraform/environments
    terragrunt apply-all --terragrunt-non-interactive
  8. Build and deploy all of our applications:

    git tag -d release_all
    git push --delete origin release_all
    git tag release_all
    git push origin release_all   # This will trigger our CI/CD
  9. Opening this page should start creation of Monitoring workspace and will activate Stackdriver: https://console.cloud.google.com/monitoring/dashboards

  10. Now just wait for the build to complete: https://console.cloud.google.com/cloud-build/builds

  11. Works now!

Looking around

First, switch to our production cluster:

kubectl config use-context "gke_${TF_VAR_project_id}_europe-west2-a_cluster-demo-prod"

Check if our app is synced and healthy:

argocd app get hipstershop-prod

app-get


List our canary rollouts:

kubectl argo rollouts list rollouts


Check out details for some particular rollout/microservice:

kubectl argo rollouts get rollout adservice


We even have a neat web-interface to manage our applications lifecycle, do rollbacks, etc:

IP="$(argocd context | awk 'NR==2 {print $3}')"
PASS="$(kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f2)"
printf '\nThe web-interface is here: https://%s, username is "admin", password is "%s"\n\n' "$IP" "$PASS"

argocd-webui


Now buy something in our "Hipstershop" application to produce some data for Stackdriver:

IP="$(kubectl get service frontend-external | awk 'NR==2 {print $4}')"
printf '\nApplication is here: http://%s\n\n' "$IP"

hipstershop

Stackdriver

Monitoring

Simple k8s monitoring dashboard:

https://console.cloud.google.com/monitoring/dashboards/resourceList/kubernetes

dashboard

There are lots of metrics out of the box, thanks to GCP, GKE, Kubernetes, Istio and instrumentation on applications side:

https://console.cloud.google.com/monitoring/metrics-explorer

metrics

Distributed tracing

https://console.cloud.google.com/traces/list

trace

Profiling

https://console.cloud.google.com/profiler

profiler

Debugging

https://console.cloud.google.com/debug

debugger

Cleanup

This should destroy everything, except Terraform remote state bucket and enabled services/APIs:

cd /git/gke-demo/terraform/environments
terragrunt destroy-all --terragrunt-non-interactive

As an alternative, this will completely delete the project, cleaning up everything we've created:

gcloud projects delete $TF_VAR_project_id

Stop and remove the container with the tooling from your system (run this outside the container):

docker rm -f gke-demo

You might want to remove GCB application from your GitHub account too.

Implementing this in the real world

This project has some nice (and useful in production systems) things implemented, but this is nevertheless a demonstration. What I would do differently if this would be making me $$$:

  1. Dump Google Cloud AbominationBuild. The worst thing in this demo I worked with so far. In the end it feels more like a one big ugly shell script split into chunks, each of them executed in a separate container. When it works, it works, but... it doesn't even have dependencies between the builds and no sane way to handle concurrency. Almost no builders, existing ones are as sophisticated as RUN apt install terraform, ENTRYPOINT terraform.

    Check out Concourse, Drone CI, Spinnaker, or if you are going 100% Kubernetes try something like Argo stack, it was a breeze to work with ArgoCD/Rollouts.

  2. The repo should be split at least in two — one for the shared infrastructure automation code (e.g. Terraform), the other one for microservices. I would say that you better split your microservices in different repos too, this will allow for cleaner CI, though I heard about people using monorepos. There is still some glue needed to avoid versioning mess and race conditions in your CI/CD when you have multiple microservices developed, tested and deployed simultaneously.

  3. You'll need to streamline developers workflow on their local machines with something like Minikube and Skaffold. Developer should be able to deploy microservices and accompanying software (DBs) that is needed to properly develop/test his own microservice on his local machine, to minimize testing in staging environment (it's much slower and more expensive). It will be a sort of a replacement for docker-compose.yml files in the root of application repo that helps you deploy DBs and stuff and maybe even microservices you are heavily depend on.

  4. Versions are pinned as precisely as they could in all the places, this is just to keep this demo working without much maintenance. Updating these pinned versions programmatically is always a big PITA, but this is a right way and you need to figure it out. As a simplified workflow you can pin to the minor (not patch) version, so that patch updates applied automatically. This way you'll need to go through your code and repin minor versions manually from time to time. And of course uncontrolled patch updates can break things sometimes, but that will happen rarely, so it's a kind of a trade-off between reliable and simple.

  5. For all of this to actually be reliable and resilient you need comprehensive testing on many levels, including E2E and load testing.

  6. You might not want to instantly and without any confirmation deploy to production every thing that passed staging environment.

  7. Industry-standard Prometheus might be a better choice than proprietary Stackdriver.

  8. I didn't use more traditional CMS like Ansible, since all my needs were covered by Terraform and Kubernetes. It still might be useful when working with VMs, but with hosted K8S I didn't really need that.

  9. There might be something else that I didn't know that I forgot that I didn't know.

  10. $ grep -RE 'TODO|FIXME'

Known issues

  1. Missing required GCS remote state configuration project
    Reason: sometimes Terragrunt can't parse few keys (e.g. project) in remote_state.config.
    Workaround: /git/gke-demo/scripts/terragrunt-cleanup.sh

  2. Terragrunt/Terraform fails during init phase
    Workaround: /git/gke-demo/scripts/terragrunt-reinit.sh
    If didn't help: /git/gke-demo/scripts/terragrunt-cleanup.sh

  3. connection reset by peer, connection closed, SSL/TLS errors
    Reason: Terraform sucks
    Workaround: rerun failed command

Halp?

I tested it many times, but I could have missed something.
If you experience any problems, let me know and leave an issue, thanks.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].