All Projects → Comcast → Kuberhealthy

Comcast / Kuberhealthy

Licence: apache-2.0
A Kubernetes operator for running synthetic checks as pods. Works great with Prometheus!

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Kuberhealthy

Spidermon
Scrapy Extension for monitoring spiders execution.
Stars: ✭ 309 (-66.41%)
Mutual labels:  hacktoberfest, monitoring
Cortex
A horizontally scalable, highly available, multi-tenant, long term Prometheus.
Stars: ✭ 4,491 (+388.15%)
Mutual labels:  hacktoberfest, monitoring
Osquery
SQL powered operating system instrumentation, monitoring, and analytics.
Stars: ✭ 18,475 (+1908.15%)
Mutual labels:  hacktoberfest, monitoring
Health Checks Api
Standardize the way services and applications expose their status in a distributed application
Stars: ✭ 78 (-91.52%)
Mutual labels:  health, monitoring
Cluster Monitoring
Cluster monitoring stack for clusters based on Prometheus Operator
Stars: ✭ 453 (-50.76%)
Mutual labels:  hacktoberfest, monitoring
Health
Laravel Health Panel
Stars: ✭ 1,774 (+92.83%)
Mutual labels:  health, hacktoberfest
Cht Core
The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
Stars: ✭ 354 (-61.52%)
Mutual labels:  health, hacktoberfest
Monitor Adgroupmembership
PowerShell script to monitor Active Directory groups and send an email when someone is changing the membership
Stars: ✭ 190 (-79.35%)
Mutual labels:  hacktoberfest, monitoring
Check postgres
Nagios check_postgres plugin for checking status of PostgreSQL databases
Stars: ✭ 438 (-52.39%)
Mutual labels:  hacktoberfest, monitoring
Alertmanager
Prometheus Alertmanager
Stars: ✭ 4,574 (+397.17%)
Mutual labels:  hacktoberfest, monitoring
Gatus
⛑ Gatus - Automated service health dashboard
Stars: ✭ 1,203 (+30.76%)
Mutual labels:  health, monitoring
Prometheus Operator
Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes
Stars: ✭ 6,451 (+601.2%)
Mutual labels:  hacktoberfest, monitoring
Graphite exporter
Server that accepts metrics via the Graphite protocol and exports them as Prometheus metrics
Stars: ✭ 217 (-76.41%)
Mutual labels:  hacktoberfest, monitoring
Augur
Python library and web service for Open Source Software Health and Sustainability metrics & data collection.
Stars: ✭ 304 (-66.96%)
Mutual labels:  health, hacktoberfest
Librenms
Community-based GPL-licensed network monitoring system
Stars: ✭ 2,567 (+179.02%)
Mutual labels:  hacktoberfest, monitoring
Chronos
📊 📊 📊 Monitors the health and web traffic of servers, microservices, and containers with real-time data monitoring and receive automated notifications over Slack or email.
Stars: ✭ 347 (-62.28%)
Mutual labels:  health, monitoring
Promster
⏰A Prometheus exporter for Hapi, express and Marble.js servers to automatically measure request timings 📊
Stars: ✭ 146 (-84.13%)
Mutual labels:  hacktoberfest, monitoring
Exceptionless
Exceptionless server and jobs
Stars: ✭ 2,107 (+129.02%)
Mutual labels:  hacktoberfest, monitoring
Guider
Performance Analyzer
Stars: ✭ 393 (-57.28%)
Mutual labels:  hacktoberfest, monitoring
Opennms
Enterprise-Grade Open-Source Network Management Platform
Stars: ✭ 568 (-38.26%)
Mutual labels:  hacktoberfest, monitoring

An operator for synthetic monitoring on Kubernetes. Write your own tests in your own container and Kuberhealthy will manage everything else. Automatically creates and sends metrics to Prometheus and InfluxDB. Included simple JSON status page. Supplements other solutions like Prometheus very nicely!

License Go Report Card CII Best Practices Twitter Follow
Join Slack

What is Kuberhealthy?

Kuberhealthy is an operator for running synthetic checks. By creating a custom resource (a khcheck) in your cluster, you can easily enable various synthetic test containers. Kuberhealthy does all the work of scheduling your checks on an interval you specify (like a CronJob), ensuring they run properly within an allotted timeout, maintaining the current up/down state with durability, and producing metrics. There are lots of useful checks already available to ensure the core functionality of Kubernetes, but checks can be used to test anything you like. We encourage you to write your own check container in any language to test your own applications!

Kuberhealthy serves a simple JSON status page, a Prometheus metrics endpoint, and supports InfluxDB metric forwarding for integration into your choice of alerting solution.

Here is an illustration of how Kuberhealthy provisions and operates checker pods. In this example, the checker pod both deploys a daemonset and tears it down while carefully watching for errors. The result of the check is then sent back to Kuberhealthy and channeled into upstream metrics and status pages to indicate basic Kubernetes cluster functionality across all nodes in a cluster.

Create Synthetic Checks for Your App

With Kuberhealthy, you can easily create synthetic tests to check your applications with real world use cases. Read more about how external checks are configured in the documentation here and learn how to create your own check container in any language here. Clients for external checks outside of Go can be found in the clients directory.

Installation

Requires Kubernetes 1.11 or above and Helm 3

  1. Create namespace "kuberhealthy" in the desired Kubernetes cluster/context:
    kubectl create namespace kuberhealthy
  2. Set your current namespace to "kuberhealthy":
    kubectl config set-context --current --namespace=kuberhealthy
  3. Add the kuberhealthy repo to Helm:
    helm repo add kuberhealthy https://comcast.github.io/kuberhealthy/helm-repos
  4. Install kuberhealthy:
    helm install kuberhealthy kuberhealthy/kuberhealthy

After installation, Kuberhealthy will only be available from within the cluster (Type: ClusterIP) at the service URL kuberhealthy.kuberhealthy. To expose Kuberhealthy to an external checking service, you must edit the service kuberhealthy and set Type: LoadBalancer. This is done for security. Options are available in the Helm chart to bypass this and deploy with Type: LoadBalancer directly.

Kuberhealthy is currently tested on Kubernetes 1.9.x, to 1.18.x.

To configure Kuberhealthy after installation, see the configuration documentation.

The Helm installation of Kuberhealthy is automatically updated to use the latest Kuberhealthy release.

More installation options, including static yaml files are available in the /deploy directory. These flat spec files contain the most recent changes to Kuberhealthy, or the master branch. Use this if you would like to test master branch updates.

Why Are Synthetic Tests Important?

Instead of trying to identify all the things that could potentially go wrong in your application or cluster with never-ending metrics and alert configurations, synthetic tests replicate real workflow and carefully check for the expected behavior to occur. By default, Kuberhealthy monitors all basic Kubernetes cluster functionality including deployments, daemonsets, services, nodes, kube-system health and more.

Some examples of problems Kuberhealthy has detected in production with just the default checks enabled:

  • Nodes where new pods get stuck in Terminating due to CNI communication failures
  • Nodes where new pods get stuck in ContainerCreating due to disk provisoning errors
  • Nodes where new pods get stuck in Pending due to container runtime errors
  • Nodes where Docker or Kubelet is in a bad state but passing health checks
  • Nodes that are unable to properly communicate with the api server due to kube-api request limiting
  • Nodes that cannot provision or terminate pods quickly enough (15m) due to high I/O wait
  • A pod in the kube-system namespace that has begun restarting too quickly
  • An unexpected admission controller failure causing pod creation failure
  • Intermittent failures to access or create custom resources
  • kube-dns/CoreDNS DNS lookup failures (internal and external)
  • ... more!

Status Page

You can directly access the current test statuses by accessing the kuberhealthy.kuberhealthy HTTP service on port 80. The status page displays server status in the format shown below. The boolean OK field can be used to indicate global up/down status, while the Errors array will contain a list of all check error descriptions. Granular, per-check information, including how long the check took to run (Run Duration), the last time a check was run, and the Kuberhealthy pod ran that specific check is available under the CheckDetails object.

{
    "OK": true,
    "Errors": [],
    "CheckDetails": {
        "kuberhealthy/daemonset": {
            "OK": true,
            "Errors": [],
            "RunDuration": "22.512278967s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:24:16.7718171Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "9abd3ec0-b82f-44f0-b8a7-fa6709f759cd"
        },
        "kuberhealthy/deployment": {
            "OK": true,
            "Errors": [],
            "RunDuration": "29.142295647s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:26:40.7444659Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "5f0d2765-60c9-47e8-b2c9-8bc6e61727b2"
        },
        "kuberhealthy/dns-status-internal": {
            "OK": true,
            "Errors": [],
            "RunDuration": "2.43940936s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:34:04.8927434Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "c85f95cb-87e2-4ff5-b513-e02b3d25973a"
        },
        "kuberhealthy/pod-restarts": {
            "OK": true,
            "Errors": [],
            "RunDuration": "2.979083775s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:34:06.1938491Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "a718b969-421c-47a8-a379-106d234ad9d8"
        }
    },
    "CurrentMaster": "kuberhealthy-7cf79bdc86-m78qr"
}

High Availability

Kuberhealthy scales horizontally in order to be fault tolerant. By default, two instances are used with a pod disruption budget and RollingUpdate strategy to ensure high availability.

Centralized Check State State

The state of checks is centralized as custom resource records. This allows Kuberhealthy to always serve the same result, no matter which node in the pool you hit. The current master running checks is calculated by all nodes in the deployment by simply querying the Kubernetes API for 'Ready' Kuberhealthy pods of the correct label, and sorting them alphabetically by name. The node that comes first is master. These two strategies together enable Kuberhealthy to maintain state and scale horizontally without deploying an additional backing database.

Synthetic KPIs with Kuberhealthy

Using Kuberhealthy with prometheus can help capture useful synthetic KPIs. Check out the K8s KPIs with Kuberhealthy doc to learn more on how to install Kuberhealthy and collect cluster KPIs.

Security Considerations

By default, Kuberhealthy exposes an insecure (non-HTTPS) JSON status endpoint without authentication. You should never expose this endpoint to the public internet. Exposing Kuberhealthy's status page to the public internet could result in private cluster information being exposed to the public internet when errors occur and are displayed on the page.

Vulnerabilities or other security related issues should be logged as Github issues in this project. All new issues are reviewed regularly. Please be careful not to post any sensitive information in your report!

Contributing

If you're interested in contributing to this project:

  • Check out the Contributing Guide.
  • If you use Kuberhealthy in a production environment, add yourself to the list of Kuberhealthy adopters!
  • Check out open issues. If you're new to the project, look for the good first issue tag.
  • We're always looking for external check contributions (either in suggestions or in PRs) as well as feedback from folks implementing Kuberhealthy locally or in a test environment.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].