All Projects → adhorn → aws-fis-templates-cdk

adhorn / aws-fis-templates-cdk

Licence: MIT license
Collection of AWS Fault Injection Simulator (FIS) experiment templates deploy-able via the AWS CDK

Programming Languages

typescript
32286 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to aws-fis-templates-cdk

Chaos Ssm Documents
Collection of AWS SSM Documents to perform Chaos Engineering experiments
Stars: ✭ 225 (+423.26%)
Mutual labels:  sre, amazon-web-services, chaos-engineering
aws-chaos-scripts
DEPRECATED Collection of python scripts to run failure injection on AWS infrastructure
Stars: ✭ 91 (+111.63%)
Mutual labels:  sre, amazon-web-services, chaos-engineering
aws-lambda-chaos-injection
Chaos Injection library for AWS Lambda
Stars: ✭ 82 (+90.7%)
Mutual labels:  sre, amazon-web-services, chaos-engineering
devops-notes
My technical documentation in the SRE / DevOps paradigm.
Stars: ✭ 19 (-55.81%)
Mutual labels:  sre, devops-tools
Pumba
Chaos testing, network emulation, and stress testing tool for containers
Stars: ✭ 2,136 (+4867.44%)
Mutual labels:  chaos-testing, chaos-engineering
xk6-chaos
xk6 extension for running chaos experiments with k6 💣
Stars: ✭ 18 (-58.14%)
Mutual labels:  sre, chaos-engineering
Chaos Mesh
A Chaos Engineering Platform for Kubernetes.
Stars: ✭ 4,265 (+9818.6%)
Mutual labels:  chaos-testing, chaos-engineering
Howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Stars: ✭ 6,962 (+16090.7%)
Mutual labels:  sre, chaos-engineering
Howtheyaws
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)
Stars: ✭ 389 (+804.65%)
Mutual labels:  sre, amazon-web-services
Runbook
A framework for gradual system automation
Stars: ✭ 531 (+1134.88%)
Mutual labels:  sre, devops-tools
Marmot
Marmot workflow execution engine
Stars: ✭ 174 (+304.65%)
Mutual labels:  sre, devops-tools
Awesome Sre Tools
A curated list of Site Reliability and Production Engineering Tools
Stars: ✭ 186 (+332.56%)
Mutual labels:  sre, devops-tools
Litmus
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
Stars: ✭ 2,377 (+5427.91%)
Mutual labels:  chaos-testing, chaos-engineering
Awesome Chaos Engineering
A curated list of Chaos Engineering resources.
Stars: ✭ 4,740 (+10923.26%)
Mutual labels:  chaos-testing, chaos-engineering
Chaosblade
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
Stars: ✭ 4,343 (+10000%)
Mutual labels:  chaos-testing, chaos-engineering
Rundeck
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
Stars: ✭ 4,426 (+10193.02%)
Mutual labels:  sre, devops-tools
chaosmonkey
Go client to the Chaos Monkey REST API
Stars: ✭ 54 (+25.58%)
Mutual labels:  chaos-testing, chaos-engineering
docker-simianarmy
Docker image of Netflix's Simian Army
Stars: ✭ 74 (+72.09%)
Mutual labels:  chaos-testing, chaos-engineering
Wheel Of Misfortune
A role-playing game for incident management training
Stars: ✭ 57 (+32.56%)
Mutual labels:  sre, chaos-engineering
cli
Reliably CLI - Optimise your operations
Stars: ✭ 2 (-95.35%)
Mutual labels:  sre, chaos-engineering

Issues Maintenance Twitter

Templates for AWS Fault Injection Simulator (FIS)

These templates let you perform fault injection experiments on resources (applications, network, and infrastructure) in the AWS Cloud.

What is AWS FIS anyway?

AWS Fault Injection Simulator (AWS FIS) is a managed service that enables you to perform fault injection experiments on your AWS workloads. Fault injection is based on the principles of chaos engineering. These experiments stress an application by creating disruptive events so that you can observe how your application responds. You can then use this information to improve the performance and resiliency of your applications so that they behave as expected.

To use AWS FIS, you set up and run experiments that help you create the real-world conditions needed to uncover application issues that can be difficult to find otherwise. AWS FIS provides templates that generate disruptions, and the controls and guardrails that you need to run experiments in production, such as automatically rolling back or stopping the experiment if specific conditions are met.

What is included in this package?

This CDK package will deplay a bunch of stacks. (1) the parent stack FISPa, (2) a stack for the IAM roles FisRole, (3) a stack for the stop-condition StopCond (CloudWatch alarm), (4) a stack for each FIS experiment group (EC2API, AsgExp, EksExp, NaclExp, Ec2InstExp), and (5) a stack dedicated to uploading SSM documents FisSsmDocs.

You can pick and choose which experiment group you want to deploy by simply commenting out the respective stacks here

1 - The IAM roles required to run the experiments:

  • The AWS FIS role with all necessary policies as described here
  • SSM Automation document role for faults using SSMA.

2 - A set of AWS FIS experiments to get you started:

EC2 Instance faults

  • Including:
    • Stopping and restarting (after duration) all EC2 instances in a VPC, an AZ, and with particular tags.
    • Injecting CPU stress on random EC2 instances in a VPC
    • Injecting latency on requets to particular domain (e.g. www.amazon.com) to all EC2 instances in a VPC, an AZ, and with particular tags.

EC2 Control Plane faults

  • Including:
    • Injecting EC2 API Internal Error on a target IAM role
    • Injecting EC2 API Throttle Error on a target IAM role
    • Injecting EC2 API Unavailable Error on a target IAM role

Auto Scaling Group faults

  • Including:
    • Terminate all EC2 instances of a random AZ in a particular auto scaling group.
    • Injecting CPU stress on All EC2 instances of a particular auto scaling group.

Network Access Control List faults

  • Including:
    • Modifying Nacls associated with subnets that belong to a particular AZ to deny traffic in that AZ.

EKS faults

  • Including:
    • Running the EC2 API action TerminateInstances on the EKS target node group.

Security Group faults

  • Including:
    • Changing a particular security group ingress rule (open SSH to 0.0.0.0/0) to verify remediation automation or monitoring. (Courtesy of Jonathan Rudge). Possible remediation automation (https://github.com/adhorn/ssh-restricted)

Iam Access faults

  • Including:
    • Denying Access to an S3 Resource from any application/services by targeting its Iam Role. (Courtesy of Rudolph Wagner)

Lambda faults

  • Including:
    • Support for Lambda Python runtime via the chaos-lambda library. chaos_lambda is a small library injecting chaos into AWS Lambda. It offers simple Python decorators to inject latency, throw exception and modify the statuscode of Lambda functions.

Configuring experiments:

These sample FIS experiments uses default values for some of the parameters, such as a vpc_id, asg_name, eks_cluster_name, etc. Modify these in the file cdk.json before deploying to reflect the particularity of your own AWS environment.

  "context": {
    "vpc_id": "vpc-01316e63b948d889d",
    "asg_name": "Test-FIS-ASG",
    "eks_cluster_name": "test-cluster-chaos",
    "security_group_id": "sg-022eb488dbd1655b3",
    "target_role_name": "target-role",
    "s3-bucket-to-deny": "mybucket/*",
    "ssm_parameter_name": "chaoslambda.config"
  }

You can also specify your own tags for filtering EC2 instances. The currently used ones are defined as:

resourceTags: {
        'FIS-Ready': 'true'
      }

3 - An example stop-condition using CloudWatch alarm

All templates use the same CloudWatch Alarm to get you started using the stop-condition. You can use this alarm to get familiar with canceling experiments. For example, you can trigger that alarm, for 1 minutes, using the following command:

aws cloudwatch set-alarm-state --alarm-name "NetworkInAbnormal" --state-value "ALARM" --state-reason "testing FIS"

Once you are familiar with the stop-condition, you should of course update the CloudWatch alarms with ones specific to your application and architecture.

4 - A stack dedicated to uploading SSM docs (Automation or Run-Command)

Deploy this package via CDK:

You first need to install the AWS CDK as described here - typically using:

npm install -g [email protected]

You then must configure your workstation with your credentials and an AWS region, if you have not already done so. If you have the AWS CLI installed, the easiest way to satisfy this requirement is issue the following command:

aws configure

Finally, you can deploy these FIS experiments using the CDK as follows:

npm install
cdk bootstrap
cdk deploy --all

During the creation of the different stacks, some will generate a security warning as follow:

(NOTE: There may be security-related changes not in this list. See https://github.com/aws/aws-cdk/issues/1299)

Do you wish to deploy these changes (y/n)?

Select y (yes).

Other useful CDK commands:

  • npm run build compile typescript to js
  • npm run watch watch for changes and compile
  • npm run test perform the jest unit tests
  • cdk deploy deploy this stack to your default AWS account/region
  • cdk diff compare deployed stack with current state
  • cdk synth emits the synthesized CloudFormation template

The cdk.json file tells the CDK Toolkit how to execute your app.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].