K8s Canary

This tutorial will walk through how to perform a canary deployment on Kubernetes. When updating an application, a slow rollout will allow for monitoring of traffic to the new deployment and allow a rollback of the deployment with minimal impact to operations.

Kubernetes Services

A Service is an abstraction of the concept of pods. since pods can be created and destroyed, a service provides a single host:port to talk to the pods configured for this service. Services are created with a set of label selectors, and all requests made to the service will be balanced to pods running with the configured label selectors.

There is no restriction that the pods behind a service need to come from the same deployment, and this will allow us to create a canary deployment to test an update before performing a full update.

The App

The applications we'll use to show the canary deployment are located in the app folder. The application uses an interface to abstract out the implementation of the AppInfo service.

The application is supposed to provide information about how the application has been deployed on Kubernetes. It looks for environment variables containing metadata about the application and labels on the container and provides them back as a json message.

package models 

type AppInfo struct {
	PodName string //Extracts the MY_POD_NAME environment variable
	AppName string //Extracts the value for the app label 
	Namespace string //Extracts the MY_POD_NAMESPACE environment variable
	Release string //Extracts the value for the release label
	Labels  map[string]string //Contains other labels for app
}

App Structure

Models

The app/models folder contains the data transfer objects for the application.

Service

The app/service folder contains the interface definition and the implemenattions of the interface that extract the AppInfo from the runtime. This folder contains three implementations of the interface.

Transport

The app/transport folder contains the go-kit code to host the interface implementation as a service. The http.Handler returned from MakeInfoServiceHandler is hosted to translate a request to /v1/appinfo to the GetAppInfo function of the interface implementation and serialize the response back.

Dockerfiles

There are three dockerfiles in the app folder that correspond to images built with different implementations of the interface. The following table shows which structs are being used in which Dockerfile, what tag these images are pushed up with, and a brief description of what should be expected when running the application.

Struct	Dockerfile	Docker Image	Description
appInfoBaseline	app/Dockerfile.v1	`runyonsolutions/appinfo:1`	Does not return Namespace value.
appInfoBroken	app/Dockerfile.v2	`runyonsolutions/appinfo:2`	Returns an error
appInfoWithNamespace	app/Dockerfile.v3	`runyonsolutions/appinfo:3`	populates Namespace value correctly

In practice, the application being deployed would probably not have three separate implementations of the same interface, but might represent iterations made to one implementation. The use of three different structs simplifies showing the code running in each image.

The application layout does show how easily it would be to change how a service functions and interchange different storage systems for basic CRUD applications, or different implementations of analytics.

Deployment

As a baseline, we assume there is a Kubernetes deployment named appinfo that was deployed from the deployment/appinfo.yaml file. This deployment can be created via

kubectl create -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/appinfo.yaml

The deployment is very basic except for two additional configuration. First is Exposing Pod Information through Environment Variables. These are created from the yaml in Deployment YAML

        env:
        - name: MY_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: MY_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

Which create an environment variable MY_POD_NAME that contains the pod name and MY_POD_NAMESPACE which contains the namespace where the pod is running.

The second customization is Exposing pod labels into a container. which creates a file at /etc/labels containing the pods labels.

Now get the name of the pod that was deployed:

kubectl get pods -l app=appinfo

output:

NAME                       READY     STATUS    RESTARTS   AGE
appinfo-567b989978-vvzlg   1/1       Running   0          19s

Talking to the deployment

Now we can port forward localhost:8080 to the pod's container port 8080 via

POD_NAME=$(kubectl get pods -l app=appinfo -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 8080:8080

output

Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80

Open a new terminal and we can send an HTTP request:

curl -s localhost:8080/v1/appinfo | jq .

and get a response like

{
  "PodName": "appinfo-567b989978-vvzlg",
  "AppName": "appinfo",
  "Namespace": "",
  "Release": "stable",
  "Labels": {
    "pod-template-hash": "1236545534"
  }
}

In a new terminal run this command to be constantly refreshing the AppInfo, which will be useful to show the real time updates in the runtime in the following section.

 while true; do clear; curl -s localhost:8080/v1/appinfo | jq . ; sleep 1; done;

Changing labels

To see how the /etc/labels file is updated dynamically, we can add a new label to the pod and see the output of our while loop get adjusted in real time

kubectl label pods $POD_NAME newlabel=realtime

Switch back to the first terminal to stop the port forwarding with CTRL+C.

Create Service

A service provides load balancing to a selection of pods based on a particular label. The label we're going to filter by is app=appinfo. To see the pods that satisfy this, run

kubectl get pods -l app=appinfo

The service defined in deployment/appinfo-service.yaml has the label selector defined as app=appinfo. This will create a load balancer (service) that routes requests to pods with the label app=appinfo.

kubectl create -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/appinfo-service.yaml

and see it

kubectl get svc appinfo

output:

NAME      CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
appinfo   10.32.0.232   <nodes>       8080:30753/TCP   14s

There will now be an environment variables in all running containers that start after the creation of this service:

APPINFO_SERVICE_PORT=8080
APPINFO_PORT_8080_TCP_PORT=8080
APPINFO_PORT=tcp://10.32.0.232:8080
APPINFO_PORT_8080_TCP_ADDR=10.32.0.232

which will allow all pods running in the cluster to have the connection information for this service readily available.

Accessing the Service

The type of service we created was NodePort.

When this type of service is created, each node will proxy requests to a specific port on the node (the same port on each node) to the configured port on the pod (8080 for our application).

To obtain the port on the node, run

NODE_PORT=$(kubectl get svc appinfo \
  --output=jsonpath='{range .spec.ports[0]}{.nodePort}')

Cloud Firewall

When running on a cloud provider, a firewall rule might need to be added to allow TCP traffic into the nodes at the service port. To enable traffic in Google Cloud, the following command will open traffic to the node port.

gcloud compute firewall-rules create appinfo-service \
  --allow=tcp:${NODE_PORT} \
  --network ${KUBERNETES_NETWORK_NAME}

where ${KUBERNETES_NETWORK_NAME} is the name of the network Kubernetes is deployed on.

Now if ${EXTERNAL_IP} is the public address of one of the nodes in the cluster, then the service is available at http://${EXTERNAL_IP}:${NODE_PORT}

Connecting

Similar to the while loop monitoring the port forwarded traffic of a particular pod in Talking to the Deployment, the loop

 while true; do clear; curl -s ${EXTERNAL_IP}:${NODE_PORT}/v1/appinfo | jq . ; sleep 1; done;

will show the output of the service. In order to monitor the deployments in the following section, this command should be run in a new terminal.

Deploy an Update (Canary)

Looking at the output of the loop, we see that the namespace field is not being populated correctly by the deployment. One proposed (failed) solution is captured in the implementation appInfoBroken, which simulates a developers failed attempt at fixing the issue. As new software can sometimes contain bugs, a slow rollout of the new version of the software will allow for minimal impact if there is an issue.

To simulate a real deployment, we should have the currently deployed application scaled to handle the current traffic. When looking at how large to scale the current deployment (n), its helpful to understand the system's SLOs. The new canary deployment will be getting 1/(n+1) of the traffic going to the service, and reducing the canary's load by increasing the value of n lower the impact to the error budget when things go wrong.

For this tutorial, we scale to 3 replicas.

kubectl scale --replicas=3 deployment appinfo
deployment "appinfo" scaled

kubectl get pods -l app=appinfo
NAME                       READY     STATUS    RESTARTS   AGE
appinfo-567b989978-6hbm8   1/1       Running   0          15s
appinfo-567b989978-q58nt   1/1       Running   0          15s
appinfo-567b989978-vvzlg   1/1       Running   0          14m

Looking at the output of the while loop look should now show the responses coming from pods with different names. Additionally, if any labels were applied to the first pod in Chaning Labels, the newly created pods will not have those labels. This should be seen as responses will have different sets of labels.

Now we are ready to deploy our broken canary deployment:

kubectl create -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/canary-broken.yaml 
deployment "appinfo-canary-broken" created

Looking at the running pods, we can now see 4 pods with the app=appinfo label:

kubectl get pods -l app=appinfo
NAME                                    READY     STATUS    RESTARTS   AGE
appinfo-567b989978-6hbm8                1/1       Running   0          10m
appinfo-567b989978-q58nt                1/1       Running   0          10m
appinfo-567b989978-vvzlg                1/1       Running   0          25m
appinfo-canary-broken-c66665c44-7cfk6   1/1       Running   0          19s

Now looking at the loop in Connection should have about 1/4 of the response coming back with a message of

{"error": "something went wrong"}

and the other 3/4 of response should be responding as before.

Monitoring

This is when application monitoring (e.g. Prometheus/Grafana) would be able to split the metrics between canary and stable pods and show any difference in performance. A future iteration of this tutorial will demonstrate the performance differences with a monitoring solution.

Rollback

For this demo, the while loop will be used to demonstrate the health of the system, and since there are errors being returned in some responses, a rollback of the deployment is required:

kubectl delete -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/canary-broken.yaml

At this point, the broken pod should be terminating:

kubectl get pods -l app=appinfo
NAME                                    READY     STATUS        RESTARTS   AGE
appinfo-567b989978-6hbm8                1/1       Running       0          12m
appinfo-567b989978-q58nt                1/1       Running       0          12m
appinfo-567b989978-vvzlg                1/1       Running       0          26m
appinfo-canary-broken-c66665c44-7cfk6   1/1       Terminating   0          2m

and responses will return to being valid 100% of the time. In most shops, this would be enough to justify a Post Mortem and improve any testing process prior to being considered for a release.

Fix

After figuring out the issue, a new fix has been created and is ready to be rolled out. Following a similar process, a new canary deployment is created:

kubectl create -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/canary-fixed.yaml 
deployment "appinfo-canary-fixed" created

Looking at the output of the while loop now shows about 1/4 of the requests have the correct namespace value (default), and are reporting the release=canary label.

Acceptance

At the point the team is willing to accept the new version formally, the configuration on the stable app would need to be updated to the docker image of the canary deployment:

kubectl set image deployment/appinfo appinfo-containers=runyonsolutions/appinfo:3

Since the image is being update this should fire off a rolling update of the deployment and new pods should be created

kubectl get pods -l app=appinfo
NAME                                    READY     STATUS              RESTARTS   AGE
appinfo-567b989978-6hbm8                1/1       Terminating         0          16m
appinfo-567b989978-q58nt                1/1       Terminating         0          16m
appinfo-567b989978-vvzlg                1/1       Running             0          30m
appinfo-84d5cf794d-9dpfz                1/1       Running             0          5s
appinfo-84d5cf794d-dd67v                1/1       Running             0          10s
appinfo-84d5cf794d-lq45c                0/1       ContainerCreating   0          2s
appinfo-95bccb844-jnfzn                 0/1       Terminating         3          1m
appinfo-canary-fixed-744f96dc75-zbxr9   1/1       Running             0          2m

Describing any of the newly created pods should show the update image. The monitoring loop in Connection should should show labels release=stable having the namespace value correctly set.

Finally, we need to clean up the canary app.

kubectl delete -f https://raw.githubusercontent.com/runyontr/k8s-canary/master/deployment/canary-fixed.yaml

Now all pods running behind the service are updated.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

runyontr / k8s-canary

Programming Languages

Labels

Projects that are alternatives of or similar to k8s-canary