All Projects → keikoproj → lifecycle-manager

keikoproj / lifecycle-manager

Licence: Apache-2.0 license
Graceful AWS scaling event on Kubernetes using lifecycle hooks

Programming Languages

go
31211 projects - #10 most used programming language
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to lifecycle-manager

Autospotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,014 (+2162.92%)
Mutual labels:  autoscaling-groups, aws-autoscaling
AutoSpotting
Saves up to 90% of AWS EC2 costs by automating the use of spot instances on existing AutoScaling groups. Installs in minutes using CloudFormation or Terraform. Convenient to deploy at scale using StackSets. Uses tagging to avoid launch configuration changes. Automated spot termination handling. Reliable fallback to on-demand instances.
Stars: ✭ 2,058 (+2212.36%)
Mutual labels:  autoscaling-groups, aws-autoscaling
terraform-aws-eks-node-group
Terraform module to provision EKS Managed Node Group
Stars: ✭ 14 (-84.27%)
Mutual labels:  eks
http-graceful-shutdown
Gracefully terminates HTTP servers in Node.js
Stars: ✭ 79 (-11.24%)
Mutual labels:  graceful-shutdown
k8s-istio-observe-frontend
Angular 12-based front-end UI for k8s Golang observability project: https://github.com/garystafford/k8s-istio-observe-backend/tree/2021-istio
Stars: ✭ 20 (-77.53%)
Mutual labels:  eks
cdk-examples
AWS CDK Examples Repository
Stars: ✭ 49 (-44.94%)
Mutual labels:  eks
breaker
🚧 Flexible mechanism to make execution flow interruptible.
Stars: ✭ 100 (+12.36%)
Mutual labels:  graceful-shutdown
event-exporter
This tool is used to export Kubernetes events to CloudWatch logs
Stars: ✭ 24 (-73.03%)
Mutual labels:  eks
my-cluster
My Kubernetes cluster
Stars: ✭ 27 (-69.66%)
Mutual labels:  eks
sigctx
Go contexts for graceful shutdown
Stars: ✭ 55 (-38.2%)
Mutual labels:  graceful-shutdown
ekz
An EKS-D Kubernetes distribution for desktop
Stars: ✭ 87 (-2.25%)
Mutual labels:  eks
node-graceful-shutdown
Gracefully shutdown your modular NodeJS application.
Stars: ✭ 20 (-77.53%)
Mutual labels:  graceful-shutdown
hiatus-spring-boot
No description or website provided.
Stars: ✭ 23 (-74.16%)
Mutual labels:  graceful-shutdown
amazon-ec2-auto-scaling-group-examples
This repository contains code samples, learning activities, and best-practices for scaling and elasticity with Amazon EC2 Auto Scaling groups.
Stars: ✭ 27 (-69.66%)
Mutual labels:  autoscaling-groups
eks-anywhere
Run Amazon EKS on your own infrastructure 🚀
Stars: ✭ 1,633 (+1734.83%)
Mutual labels:  eks
iskan
Kubernetes Native, Runtime Container Image Scanning
Stars: ✭ 35 (-60.67%)
Mutual labels:  eks
kms
🔪 Is a library that aids in graceful shutdown of a process/application
Stars: ✭ 44 (-50.56%)
Mutual labels:  graceful-shutdown
eks-anywhere-prow-jobs
This repository contains Prowjob configurations for Amazon EKS Anywhere. You can view the jobs at https://prow.eks.amazonaws.com.
Stars: ✭ 14 (-84.27%)
Mutual labels:  eks
laravel-php-k8s
Just a simple port of renoki-co/php-k8s for easier access in Laravel
Stars: ✭ 71 (-20.22%)
Mutual labels:  eks
ssm-agent-daemonset-installer
A DaemonSet to apply configuration to Kubernetes worker nodes after they've been bootstrapped.
Stars: ✭ 19 (-78.65%)
Mutual labels:  eks

lifecycle-manager

Build Status codecov Go Report Card version

Graceful AWS scaling event on Kubernetes using lifecycle hooks

lifecycle-manager is a service that can be deployed to a Kubernetes cluster in order to make AWS autoscaling events more graceful using draining

Certain termination activities such as AZRebalance or TerminateInstanceInAutoScalingGroup API calls can cause autoscaling groups to terminate instances without having them properly drain first.

This can cause apps to experience errors when they are abruptly terminated.

lifecycle-manager uses lifecycle hooks from the autoscaling group (via SQS) to pre-drain the instances for you.

In addition to node draining, lifecycle-manager also tries to deregister the instance from any discovered ALB target group, this helps with pre-draining for the ALB instances prior to shutdown in order to avoid in-flight 5xx errors on your ALB - this feature is currently supported for aws-alb-ingress-controller.

Usage

  1. Configure your scaling groups to notify lifecycle-manager of terminations. you can use the provided enrollment CLI by running
$ make build
...

$ ./bin/lifecycle-manager enroll --region us-west-2 --queue-name lifecycle-manager-queue --notification-role-name my-notification-role --target-scaling-groups scaling-group-1,scaling-group-2 --overwrite

INFO[0000] starting enrollment for scaling groups [scaling-group-1 scaling-group-2]
INFO[0000] creating notification role 'my-notification-role'
INFO[0000] notification role 'my-notification-role' already exist, updating...
INFO[0000] attaching notification policy 'arn:aws:iam::aws:policy/service-role/AutoScalingNotificationAccessRole'
INFO[0001] created notification role 'arn:aws:iam::000000000000:role/my-notification-role'
INFO[0001] creating SQS queue 'lifecycle-manager-queue'
INFO[0001] created queue 'arn:aws:sqs:us-west-2:000000000000:lifecycle-manager-queue'
INFO[0001] creating lifecycle hook for 'scaling-group-1'
INFO[0002] creating lifecycle hook for 'scaling-group-2'
INFO[0002] successfully enrolled 2 scaling groups
INFO[0002] Queue Name: lifecycle-manager-queue
INFO[0002] Queue URL: https://sqs.us-west-2.amazonaws.com/000000000000/lifecycle-manager-queue

Alternatively, you can simply follow the AWS docs to create an SQS queue named lifecycle-manager-queue, a notification role, and a lifecycle-hook on your autoscaling group pointing to the created queue.

Configured scaling groups will now publish termination hooks to the SQS queue you created.

  1. Deploy lifecycle-manager to your cluster:
kubectl create namespace lifecycle-manager

kubectl apply -f https://raw.githubusercontent.com/keikoproj/lifecycle-manager/master/examples/lifecycle-manager.yaml

Modifications may be needed if you used a different queue name than mentioned above

  1. Kill an instance in your scaling group and watch it getting drained:
$ aws autoscaling terminate-instance-in-auto-scaling-group --instance-id i-0868736e381bf942a --region us-west-2 --no-should-decrement-desired-capacity
{
    "Activity": {
        "ActivityId": "5285b629-6a18-0a43-7c3c-f76bac8205f0",
        "AutoScalingGroupName": "scaling-group-1",
        "Description": "Terminating EC2 instance: i-0868736e381bf942a",
        "Cause": "At 2019-10-02T02:44:11Z instance i-0868736e381bf942a was taken out of service in response to a user request.",
        "StartTime": "2019-10-02T02:44:11.394Z",
        "StatusCode": "InProgress",
        "Progress": 0,
        "Details": "{\"Subnet ID\":\"subnet-0bf9bc85fEXAMPLE\",\"Availability Zone\":\"us-west-2c\"}"
    }
}

$ kubectl logs lifecycle-manager
time="2020-03-10T23:44:20Z" level=info msg="starting lifecycle-manager service v0.3.4"
time="2020-03-10T23:44:20Z" level=info msg="region = us-west-2"
time="2020-03-10T23:44:20Z" level=info msg="queue = lifecycle-manager-queue"
time="2020-03-10T23:44:20Z" level=info msg="polling interval seconds = 10"
time="2020-03-10T23:44:20Z" level=info msg="node drain timeout seconds = 300"
time="2020-03-10T23:44:20Z" level=info msg="node drain retry interval seconds = 30"
time="2020-03-10T23:44:20Z" level=info msg="with alb deregister = true"
time="2020-03-10T23:44:20Z" level=info msg="starting metrics server on /metrics:8080"
time="2020-03-11T07:24:37Z" level=info msg="i-0868736e381bf942a> received termination event"
time="2020-03-11T07:24:37Z" level=info msg="i-0868736e381bf942a> sending heartbeat (1/24)"
time="2020-03-11T07:24:37Z" level=info msg="i-0868736e381bf942a> draining node/ip-10-105-232-73.us-west-2.compute.internal"
time="2020-03-11T07:24:37Z" level=info msg="i-0868736e381bf942a> completed drain for node/ip-10-105-232-73.us-west-2.compute.internal"
time="2020-03-11T07:24:45Z" level=info msg="i-0868736e381bf942a> starting load balancer drain worker"
...
time="2020-03-11T07:24:49Z" level=info msg="event ce25c321-ec67-3f0b-c156-a7c1f75caf1a completed processing"
time="2020-03-11T07:24:49Z" level=info msg="i-0868736e381bf942a> setting lifecycle event as completed with result: CONTINUE"
time="2020-03-11T07:24:49Z" level=info msg="event ce25c321-ec67-3f0b-c156-a7c1f75caf1a for instance i-0868736e381bf942a completed after 12.054675203s"

Required AWS Auth

{
    "Effect": "Allow",
    "Action": [
        "autoscaling:DescribeLifecycleHooks",
        "autoscaling:CompleteLifecycleAction",
        "autoscaling:RecordLifecycleActionHeartbeat",
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:GetQueueUrl",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeClassicLinkInstances",
        "ec2:DescribeInstances",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:DescribeInstanceHealth",
        "elasticloadbalancing:DescribeLoadBalancers",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:DescribeTargetHealth",
        "elasticloadbalancing:DescribeTargetGroups"
    ],
    "Resource": "*"
}

Flags

Name Default Type Description
local-mode "" String absolute path to kubeconfig
region "" String AWS region to operate in
queue-name "" String the name of the SQS queue to consume lifecycle hooks from
kubectl-path "/usr/local/bin/kubectl" String the path to kubectl binary
log-level "info" String the logging level (info, warning, debug)
max-drain-concurrency 32 Int maximum number of node drains to process in parallel
max-time-to-process 3600 Int max time to spend processing an event
drain-timeout 300 Int hard time limit for draining healthy nodes
drain-timeout-unknown 30 Int hard time limit for draining nodes that are in unknown state
drain-interval 30 Int interval in seconds for which to retry draining
polling-interval 10 Int interval in seconds for which to poll SQS
with-deregister true Bool try to deregister deleting instance from target groups
refresh-expired-credentials false Bool refreshes expired credentials (requires shared credentials file)

Release History

Please see CHANGELOG.md.

Contributing

Please see CONTRIBUTING.md.

Developer Guide

Please see DEVELOPER.md.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].