All Projects β†’ fhivemind β†’ sre-playground

fhivemind / sre-playground

Licence: other
🎯 A set of Site Reliability Engineering notes & challenges

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
HTML
75241 projects

Projects that are alternatives of or similar to sre-playground

devops-book
运维开发
Stars: ✭ 29 (+20.83%)
Mutual labels:  infrastructure, sre, cicd
Sre Interview Prep Guide
Site Reliability Engineer Interview Preparation Guide
Stars: ✭ 2,446 (+10091.67%)
Mutual labels:  sre, site-reliability-engineer
devopsbookmarks.org
Website of devopsbookmarks.org
Stars: ✭ 14 (-41.67%)
Mutual labels:  infrastructure, sre
Howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Stars: ✭ 6,962 (+28908.33%)
Mutual labels:  infrastructure, sre
DscWorkshop
Blueprint for a full featured DSC project for Push / Pull with or without CI/CD
Stars: ✭ 151 (+529.17%)
Mutual labels:  infrastructure, cicd
tinycore-kernel
TinyCore Linux kernel and module compile scripts. Download pre-built kernels and modules here: https://bintray.com/on-prem/tinycore-kernels/linux
Stars: ✭ 22 (-8.33%)
Mutual labels:  infrastructure
davx5-ose
DAVx⁡ is an open-source CalDAV/CardDAV suite and sync app for Android. You can also access your online files (WebDAV) with it.
Stars: ✭ 160 (+566.67%)
Mutual labels:  tasks
terraform-github-repository-webhooks
Terraform module to provision webhooks on a set of GitHub repositories
Stars: ✭ 20 (-16.67%)
Mutual labels:  cicd
azure-policy-as-code
Bicep and Terraform code examples for policy-as-code workflows. Azure governance guardrails and automation - by @jesseloudon
Stars: ✭ 101 (+320.83%)
Mutual labels:  cicd
jt tools
Ruby on Rails Continuous Deployment Ecosystem to maintain Healthy Stable Development
Stars: ✭ 13 (-45.83%)
Mutual labels:  cicd
epam-java-cources
Practice tasks for EPAM students of Java Core courses. Write code with pleasure!
Stars: ✭ 20 (-16.67%)
Mutual labels:  tasks
planvelo-carte
Observatoire du Plan VΓ©lo
Stars: ✭ 28 (+16.67%)
Mutual labels:  infrastructure
Boas-Praticas-Cplusplus
Guia RΓ‘pido de Boas PrΓ‘ticas em C++
Stars: ✭ 67 (+179.17%)
Mutual labels:  guide
jobxx
Lightweight C++ task system
Stars: ✭ 76 (+216.67%)
Mutual labels:  tasks
WeekToDoWeb
WeekToDo is a free minimalist weekly planner app focused on privacy. Schedule your tasks and projects with to do lists and a calendar. Available for Windows, Mac, Linux or online.
Stars: ✭ 48 (+100%)
Mutual labels:  tasks
c3
π—–πŸ― provides compliant AWS CDK components to various security standards.
Stars: ✭ 24 (+0%)
Mutual labels:  infrastructure
Android Hacking
All things Android | Happy New Year πŸŽ‰ 2022️⃣!
Stars: ✭ 62 (+158.33%)
Mutual labels:  guide
book-monorepo-cicd
Effectively build, test, and deploy code with monorepos.
Stars: ✭ 59 (+145.83%)
Mutual labels:  cicd
rails-microservices-book
A guide to building distributed Ruby on Rails applications using Protocol Buffers, NATS and RabbitMQ
Stars: ✭ 23 (-4.17%)
Mutual labels:  guide
rnn-from-scratch
A Recurrent Neural Network implemented from scratch (using only numpy) in Python.
Stars: ✭ 62 (+158.33%)
Mutual labels:  guide

🎯 SRE Playground

The goal of this project is to introduce you to basic SRE topics. It was designed to give an overview covered by SRE during a whole application life-cycle.

πŸ“– Walkthrough

We will divide this challenge into multiple stages to more closely explain what is happening in each cycle.

πŸ’­ Design

Start by thinking about how the whole infrastructure can be deployed, maintained, monitored, discarded, extended, or automated. The point of this step is to have a clearer picture of how to proceed, to rule out all the improbable scenarios and major blockers. In this step, try to answer some basic questions.

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Sherlock Holmes


❔ What is the tech-stack?

It will give you an idea of what technologies you as an SRE should continue with. Remember, you are defining the way how the whole process is going to work. Hence, you need to decide on the tools you are going to be working with.

Example
You bought some IKEA furniture. What tools are you going to use for its assembly?

❔ How does it work?

Get a nice overview of how the solution works. Usually, here you get a lot of graphs, or you need to create them. Graphs help you get a grasp of what is happening. Some services talk to each other, some are independent, some require more memory, some require a lot of computing power. Here you need information - well-defined and concise information. Without understanding what is happening, you cannot decide what you want to happen.

Example
Once the IKEA furniture is there, which piece connects to which? Do I need help to lift the pieces, turn them, isolate them, protect them somehow?

❔ When, and how often?

Here, you need to think in terms of how the whole solution is going to be updated, deleted, recreated, or moved. Most likely, you will have to implement some kind of automated mechanism on how to combat these issues. Notoriously, this is the moment when Continuous Integration (CI), Continuous Deployment (CD), Anything-as-a-Code (XaaS) comes into the picture. They are not your enemy!

Example
Okay, you have assembled your IKEA furniture, but there's a piece you forgot to add. CI allows you to define a way on how to put this piece, or any other piece you might want to add to the needed place. On the other hand, CD allows you to define a way on how to put your IKEA furniture to where you want it -- in the kitchen, living room, hanging from the ceiling, or spying on your neighbors (don't do this). However, you don't want to mess up the whole darn thing, but just pieces of it. XaaS helps you with just that, isolate a component, and do something with it -- and only that.

❔ Should you know this?

One of the big challenges of automation is knowing how to secure your things. When you are developing, you want to isolate the parts that require authorization from the public, and once it has been created, glue it inside of the bigger part without allowing anyone to know how the process had been achieved. In this scenario, you have to integrate the Secret controlling mechanisms to strip away all the required private data from the service logic.

Example
Accidentally, along with your IKEA furniture, you also bought a large Samfrodo TV. After some time, one of the inside electronic pieces went bad, and you are wondering how to fix it. You call Samfrodo company for more information, and they tell how they cannot share any details about that electronic piece. However, they suggest to send you a brand new electronic piece so you can replace it yourself. See Wiki:Intellectual property.

❔ What if?

If you want the solution to be bullet-proof, you have to take into account on how everything is going to behave once It is up and running. These are referred to as edge-cases, and a proper solution has to cover them. Your infrastructure is built to serve US West Coast under 100ms, but what do you do for connections from India? It is Black Friday, and suddenly you have 100x higher traffic than usual. Typically, this has to do something with auto-scaling, load balancing, or networking. Your task is now to think of any impossible case scenario in which your infrastructure will fail -- and extend to support it.

Example #1
Your IKEA furniture is a chair. However, you have 10 guests, but only 4 chairs. It is illogical to buy 10 chairs if you have 10 friends coming over only a couple of times a month. So, you go and ring your neighbor, asking him to borrow you some chairs, which you will return (I mean, come on, who steals chairs?). You had a nice evening, and tomorrow you return these chairs.

Example #2
Strangely enough, one of your friends requested to NOT sit on the chair, but instead on the floor. How can he see the top of the table, eat, drink, play, and socialize?

❔ What is happening?

A question you will often ask throughout the whole process. Everything is now operational, you have come this far. But, one of the services is down (perhaps, a like button on Facebook is not counting likes properly), and you don't know what to do. Here, you need to set up the Monitoring, Logging, and Observability services. They help you troubleshoot and see what is happening in realtime. They spew out a lot of unnecessary things, so correctly configuring them will help you manage everything more easily.

Example
Remember the IKEA furniture you got? Well, you've been using it for quite some time, and you see that some of the pieces are malfunctioning. Logically, you try to determine what's the cause. Hours on end, you figure out the glue is not properly applied. And, you reapply it.

❔ How good is all this?

Whenever you are building something, you need to know how well it behaves. This part is focused on answering how your infrastructure is doing, health and functional -wise. It is commonly referred to as Service Level Agreement (SLA) which works based on Service Level Indicators (SLI). They depict legal requirements and arrangements between a buyer and the service seller.

Example
Before you bought your IKEA furniture, they gave you a pamphlet. On it, you read that maximum weight your furniture can hold is 200kg, it can last for 10 years on the specific conditions, and is designed for a distinct purpose. They also stated that if something is wrong or misleading, you can return it.


πŸ’» Implementation

This part will define the overall architecture of the solution. Your task is to complete the challenges provided below based on the available information.

Architecture overview

Architecture Diagram

Service Model Networking Type Port Paths Response time Dependencies Extras
data container Internal 9876 /api/* < 1000ms
info container Internal 5555 /* < 1000ms data
load balancer platform supported External 80, 443 / info
monitoring optional

Infrastructure specifications

Following the top-level definitions of the architecture, we also have to define how the infrastructure is going to be managed.

Service Rate of change (weekly) Versioning (optional) Details Rollout strategy (optional) Type
infrastructure 5 Yes A/B Deployment IaaC
load balancer 1-2 No Should be part of infrastructure code, but supported separately. Big Bang Deployment IaaC
data > 50 Yes Storage must not be discarded. Keep logs. Enhance security rules. Reconfigure path rules before swapping with the old version. Rolling Deployment SaaC
info > 50 Yes Must support high availability. Rolling Deployment SaaC
monitoring 1-2 Yes Ensure firewall rules, authentication, and authorization. Support only internal networks. Rolling Deployment SaaC

We would also like to have a way of knowing when the deployments fail or succeed. Find a way to notify the users working on the project about the deployment statuses.

🏁 Challenges

Before you start implementing, fork this repository.

  1. Dockerization - Write Dockerfiles for data and info services.
  2. Environment preparation - Register a Google Cloud Platform free tier account, and create a GCP project.
  3. Docker configs - Create a GCP service account that has read and write permissions to the Google Cloud Registry. Set Registry rules to private. Authenticate your service account against Docker.
  4. Basic CI - Select an appropriate Continuous Integration tool for your project (refer to Wiki: Comparison of continuous integration software for more details). Your pipeline should be either manually or push triggered, and should only consist of a step that builds docker images for data and info services. Leave some room for extending the pipeline.
  5. (optional) Pipeline as a Code - Configure your Pipeline strategy to be code manageable. This step includes configuring the appropriate repo which will serve as a base for extending the pipeline. Reefer to Pipeline as Code with Jenkins for details on how you can utilize Jenkins for this task.
  6. Tests - Write some basic tests to check the functionality of the services. Add the testing stage to your CI pipeline.
  7. Extending CI pipeline - Extend your CI pipeline to push the built docker images to Google Cloud Registry. In this step, find a way to use secrets when pushing the images to the repo.
  8. Docker-compose - Write a docker-compose file that deploys two services into a service mesh. Be sure to mount the appropriate ports to appropriate services.
  9. Environment-all - Update the Python services to use Environment variables instead of hard-coded values. This step will be important (more technically, we will use environment variables when configuring the rollout strategies and mostly abuse them for infrastructure configurations).
  10. GCP Instance Creation - Create an instance within your Google Cloud Project. Create a service account within GCP that has read/write access to your instance, and save it. We will use an instance for deploying Docker compose service.
  11. (optional) Infrastructurization - Familiarize yourself with IaaS tools. Write an IaaC solution that will automatically create a GKE Cluster for you. Reefer to the-ultimate-devops-tool-chest to select appropriate tools (Authors suggestion: Terraform).
  12. (optional) Kubernetization - Create Kubernetes deployment scripts for your services.

More challenges will be added... - Author

⚠️ Code submission

To submit your solution, please make a merge request and populate the table with the implementation details.

Service Tooling Implementation details Overview Links Extra Notes
SaaC
IaaC
CI
CD
Monitoring
Alerting

πŸ“ Scoring

Every implementation will be scored based on the criterias below.

  • Automation
  • Security
  • Monitoring
  • Logging
  • Scalability
  • Extendability

⭐ Further references

Visit Awesome Site Reliability Engineering to find major information about most of the SRE related topics.


Author

πŸ‘€ Ramiz Polic
Site Reliability Engineer @ SAP

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].