Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dastergon → Wheel Of Misfortune

dastergon / Wheel Of Misfortune

Licence: mit

A role-playing game for incident management training

Labels

html devops incident-response sre chaos-engineering

Projects that are alternatives of or similar to Wheel Of Misfortune

Howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Stars: ✭ 6,962 (+12114.04%)

Mutual labels: devops, incident-response, sre, chaos-engineering

Awesome Sre

A curated list of Site Reliability and Production Engineering resources.

Stars: ✭ 7,687 (+13385.96%)

Mutual labels: devops, incident-response, sre

Devops Exercises

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Stars: ✭ 20,905 (+36575.44%)

Mutual labels: devops, sre

xk6-chaos

xk6 extension for running chaos experiments with k6 💣

Stars: ✭ 18 (-68.42%)

Mutual labels: sre, chaos-engineering

Atlantis

Terraform Pull Request Automation

Stars: ✭ 4,236 (+7331.58%)

Mutual labels: devops, sre

Performance-Engineers-DevOps

This repository helps performance testers and engineers who wants to dive into DevOps and SRE world.

Stars: ✭ 35 (-38.6%)

Mutual labels: sre, chaos-engineering

aws-fis-templates-cdk

Collection of AWS Fault Injection Simulator (FIS) experiment templates deploy-able via the AWS CDK

Stars: ✭ 43 (-24.56%)

Mutual labels: sre, chaos-engineering

My Links

Knowledge seeks no man

Stars: ✭ 311 (+445.61%)

Mutual labels: devops, sre

Sysadmin Reading List

A reading/viewing list for larval stage sysadmins and SREs

Stars: ✭ 240 (+321.05%)

Mutual labels: devops, sre

Devops Readme.md

What to Read to Learn More About DevOps

Stars: ✭ 398 (+598.25%)

Mutual labels: devops, sre

Rundeck

Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts

Stars: ✭ 4,426 (+7664.91%)

Mutual labels: devops, sre

aws-lambda-chaos-injection

Chaos Injection library for AWS Lambda

Stars: ✭ 82 (+43.86%)

Mutual labels: sre, chaos-engineering

cli

Reliably CLI - Optimise your operations

Stars: ✭ 2 (-96.49%)

Mutual labels: sre, chaos-engineering

aws-chaos-scripts

DEPRECATED Collection of python scripts to run failure injection on AWS infrastructure

Stars: ✭ 91 (+59.65%)

Mutual labels: sre, chaos-engineering

Provision

Digital Rebar Provision is a simple and powerful Golang executable that provides a complete API-driven DHCP/PXE/TFTP provisioning system.

Stars: ✭ 252 (+342.11%)

Mutual labels: devops, sre

Linuxbashshellscriptforops

Linux Bash Shell Script and Python Script For Ops and Devops

Stars: ✭ 298 (+422.81%)

Mutual labels: devops, sre

Cloud Ops Sandbox

Cloud Operations Sandbox is an open source tool that helps practitioners to learn Service Reliability Engineering practices from Google and apply them on their cloud services using Cloud Operations suite of tools.

Stars: ✭ 191 (+235.09%)

Mutual labels: devops, sre

So You Want To Onboard A Devops Engineer

Guidance on how to make your environment easier to onboard for Web Ops Engineers, SRE's and DevOps Practitioners

Stars: ✭ 236 (+314.04%)

Mutual labels: devops, sre

Howtheyaws

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)

Stars: ✭ 389 (+582.46%)

Mutual labels: devops, sre

Runbook

A framework for gradual system automation

Stars: ✭ 531 (+831.58%)

Mutual labels: devops, sre

View All Similar Projects ➔

Wheel of Misfortune

Wheel of Misfortune is a game that aims to build confidence to oncall engineers via simulated outage scenarios. With the game, you practice problem debugging under stress, the understanding of the incident management protocol, and effective communication with other engineers of your team and organization. It is a great way to train new hires, interns, and seasoned engineers to become well-rounded oncall engineers.

The game is inspired by the Site Reliability Engineering book.

The demo website is available at: https://dastergon.gr/wheel-of-misfortune

Instructions

Terminology

Scenario: A past or fictional incident case.
Game Master: The host-coordinator of the session.
Volunteer: The trainee oncall engineer.

Feel free to fork the repository or download the stable release. Insert your incident scenarios into the general_incidents.json file inside the incidents/ folder.

The file has the following format:

ID: the unique ID of the outage (you can just auto-increment).
title: the title of the incident.
scenario: the description of the incident. It is useful to include URLs from monitoring systems, dashboards, time-series databases and playbooks.
inkstory: the path to an Ink story file in JSON format.

You can also use general_incidents.jsonnet as an example, in case you want to generate your incident scenarios using Jsonnet.

Ink

Ink is a scripting language for writing interactive narrative stories. It enables us to write interactive incident response narratives for team or individual trainings. You can use Inky to write an interactive narrative for an incident and then export the story as JSON. Then, you can store the story file inside the incidents/ folder and associate the Ink story file with an Incident scenario using the inkstory key. You can read an example incident narrative here.

Role Playing

Game Master

Choose a volunteer to be the primary oncall engineer in front of the group.
Find a balance between volunteer's experience and incident's difficulty.
Assist volunteer by answering questions that may arise in each theoretical action or dashboard observation.

Engage with the rest of the team and ask for different ways to debug the problem following the volunteer's explanation.
Team members may be made available over time for assistance in various topics.

At the end, have a debrief on the learnings of the session.

Volunteer

Spin the wheel and attempt to fix the theoretical outage scenario.
Explain to the Game Master and the rest of the group what actions you would take (lookup queries, checks in dashboards, etc.) to find the root causes, and eventually solve the incident.
Always keep an eye on the time, since it is simulated incident response scenario and not a routine troubleshooting process. During a real incident you might have an SLA or SLO breach and therefore you should take timing into account.
Engage with the rest of the group. Keep them in the loop. Ask questions to different members depending on their expertise.

Most importantly, have fun!

You can read a comprehensive example on how to conduct the exercise in the Google SRE book.

Resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 57

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗