All Projects → dastergon → Awesome Sre

dastergon / Awesome Sre

Licence: cc0-1.0
A curated list of Site Reliability and Production Engineering resources.

Projects that are alternatives of or similar to Awesome Sre

Howtheysre
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Stars: ✭ 6,962 (-9.43%)
Mutual labels:  monitoring, devops, incident-response, alerting, sre, site-reliability-engineering, post-mortem, reliability, on-call
Awesome Sre Tools
A curated list of Site Reliability and Production Engineering Tools
Stars: ✭ 186 (-97.58%)
Mutual labels:  list, monitoring, devops, sre, production
availability-calculator
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
Stars: ✭ 60 (-99.22%)
Mutual labels:  availability, site-reliability-engineering, service-level-agreement, postmortem, site-reliability
Kubernetes Failure Stories
Compilation of public failure/horror stories related to Kubernetes
Stars: ✭ 6,217 (-19.12%)
Mutual labels:  sre, post-mortem, reliability, postmortem
Devops Readme.md
What to Read to Learn More About DevOps
Stars: ✭ 398 (-94.82%)
Mutual labels:  monitoring, devops, sre
Wheel Of Misfortune
A role-playing game for incident management training
Stars: ✭ 57 (-99.26%)
Mutual labels:  devops, incident-response, sre
Kapo
Wrap any command in a status socket
Stars: ✭ 45 (-99.41%)
Mutual labels:  monitoring, devops, sre
Cabot
Self-hosted, easily-deployable monitoring and alerts service - like a lightweight PagerDuty
Stars: ✭ 5,209 (-32.24%)
Mutual labels:  monitoring, devops, alerting
Prom2teams
prom2teams is an HTTP server built with Python that receives alert notifications from a previously configured Prometheus Alertmanager instance and forwards it to Microsoft Teams using defined connectors
Stars: ✭ 122 (-98.41%)
Mutual labels:  monitoring, devops, alerting
Cloudprober
An active monitoring software to detect failures before your customers do.
Stars: ✭ 1,269 (-83.49%)
Mutual labels:  monitoring, devops, sre
Gatus
⛑ Gatus - Automated service health dashboard
Stars: ✭ 1,203 (-84.35%)
Mutual labels:  monitoring, devops, alerting
Netdata
Real-time performance monitoring, done right! https://www.netdata.cloud
Stars: ✭ 57,056 (+642.24%)
Mutual labels:  monitoring, devops, alerting
Ansible For Kubernetes
Ansible and Kubernetes examples from Ansible for Kubernetes Book
Stars: ✭ 389 (-94.94%)
Mutual labels:  devops, scalability
Automatron
Infrastructure monitoring framework turning DevOps runbooks into automated actions
Stars: ✭ 381 (-95.04%)
Mutual labels:  monitoring, devops
Howtheyaws
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)
Stars: ✭ 389 (-94.94%)
Mutual labels:  devops, sre
Microsoft365dsc
Manages, configures, extracts and monitors Microsoft 365 tenant configurations
Stars: ✭ 374 (-95.13%)
Mutual labels:  monitoring, devops
Cachet Monitor
Distributed monitoring plugin for CachetHQ
Stars: ✭ 427 (-94.45%)
Mutual labels:  monitoring, devops
Awesome Incident Response
A curated list of tools for incident response
Stars: ✭ 4,753 (-38.17%)
Mutual labels:  list, incident-response
Urlooker
enterprise-level websites monitoring system
Stars: ✭ 469 (-93.9%)
Mutual labels:  monitoring, devops
Healthchecks
A cron monitoring tool written in Python & Django
Stars: ✭ 4,297 (-44.1%)
Mutual labels:  monitoring, devops

Awesome Site Reliability Engineering Awesome

A curated list of awesome Site Reliability and Production Engineering resources.

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!

Contents

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

  • Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
  • Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
  • High Scalability - Technical Blog Posts About Systems Architecture.
  • rachelbythebay - Techincal Blog Posts.
  • Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
  • SysAdvent - One article for each day of December, ending on the 25th article.
  • Stephen Thorne's Blog - Blog Posts About SRE
  • Increment - A digital magazine about how teams build and operate software systems at scale.
  • GopherSRE - Blog Posts about Go and SRE.
  • Cindy Sridharan - Blog posts about distributed systems and their management.
  • Blameless Blog - Blog posts about SRE culture and practices.
  • Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
  • Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
  • FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
  • Rootly Blog - Incident management best practices and guides.

Newsletters

  • DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
  • KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
  • SRE Weekly - Weekly Site Reliability Newsletter.
  • O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
  • ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

Twitter

SRE Tools

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].