All Projects → andreas-schroeder → Kafka Health Check

andreas-schroeder / Kafka Health Check

Licence: mit
Health Check for Kafka Brokers.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Kafka Health Check

Liiklus
Reactive (RSocket/gRPC) Gateway for the event-based systems
Stars: ✭ 192 (-10.28%)
Mutual labels:  kafka
Synch
Sync data from the other DB to ClickHouse(cluster)
Stars: ✭ 200 (-6.54%)
Mutual labels:  kafka
Pos
Sample Application DDD, Reactive Microservices, CQRS Event Sourcing Powered by DERMAYON LIBRARY
Stars: ✭ 207 (-3.27%)
Mutual labels:  kafka
Firecamp
Serverless Platform for the stateful services
Stars: ✭ 194 (-9.35%)
Mutual labels:  kafka
Voik
♒︎ [WIP] An experimental ~distributed~ commit-log
Stars: ✭ 200 (-6.54%)
Mutual labels:  kafka
Qmq
QMQ是去哪儿网内部广泛使用的消息中间件,自2012年诞生以来在去哪儿网所有业务场景中广泛的应用,包括跟交易息息相关的订单场景; 也包括报价搜索等高吞吐量场景。
Stars: ✭ 2,420 (+1030.84%)
Mutual labels:  kafka
Kafka Streams Scala
Thin Scala wrapper around Kafka Streams Java API
Stars: ✭ 192 (-10.28%)
Mutual labels:  kafka
Kafka
Go driver for Kafka
Stars: ✭ 212 (-0.93%)
Mutual labels:  kafka
Strimzi Kafka Operator
Apache Kafka running on Kubernetes
Stars: ✭ 2,833 (+1223.83%)
Mutual labels:  kafka
Thunder
⚡️ Nepxion Thunder is a distribution RPC framework based on Netty + Hessian + Kafka + ActiveMQ + Tibco + Zookeeper + Redis + Spring Web MVC + Spring Boot + Docker 多协议、多组件、多序列化的分布式RPC调用框架
Stars: ✭ 204 (-4.67%)
Mutual labels:  kafka
Amazonriver
amazonriver 是一个将postgresql的实时数据同步到es或kafka的服务
Stars: ✭ 198 (-7.48%)
Mutual labels:  kafka
Franz Go
franz-go contains a high performance, pure Go library for interacting with Kafka from 0.8.0 through 2.7.0+. Producing, consuming, transacting, administrating, etc.
Stars: ✭ 199 (-7.01%)
Mutual labels:  kafka
Hivemq Mqtt Tensorflow Kafka Realtime Iot Machine Learning Training Inference
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Stars: ✭ 204 (-4.67%)
Mutual labels:  kafka
Istio Micro
istio 微服务示例代码 grpc+protobuf+echo+websocket+mysql+redis+kafka+docker-compose
Stars: ✭ 194 (-9.35%)
Mutual labels:  kafka
Kafka Client
Go client library for Apache Kafka
Stars: ✭ 210 (-1.87%)
Mutual labels:  kafka
Masterchief
C# 开发辅助类库,和士官长一样身经百战且越战越勇的战争机器,能力无人能出其右。
Stars: ✭ 190 (-11.21%)
Mutual labels:  kafka
Java Library Examples
💪 example of common used libraries and frameworks, programming required, don't fork man.
Stars: ✭ 204 (-4.67%)
Mutual labels:  kafka
Asky
Asky开源架构:极简、轻量、极致性能《Asky零基础1小时学编程 dnc+vue+tidb+redis+rabbitMQ+ES》QQ群 779699538
Stars: ✭ 213 (-0.47%)
Mutual labels:  kafka
Microservice Scaffold
基于Spring Cloud(Greenwich.SR2)搭建的微服务脚手架(适用于在线系统),已集成注册中心(Nacos Config)、配置中心(Nacos Discovery)、认证授权(Oauth 2 + JWT)、日志处理(ELK + Kafka)、限流熔断(AliBaba Sentinel)、应用指标监控(Prometheus + Grafana)、调用链监控(Pinpoint)、以及Spring Boot Admin。
Stars: ✭ 211 (-1.4%)
Mutual labels:  kafka
Tech Blog
我的个人技术博客(Python、Django、Docker、Go、Redis、ElasticSearch、Kafka、Linux)
Stars: ✭ 203 (-5.14%)
Mutual labels:  kafka

Kafka Health Check

Health checker for Kafka brokers and clusters that operates by checking whether:

  • a message inserted in a dedicated health check topic becomes available for consumers,
  • the broker can stay in the ISR of a replication check topic,
  • the broker is in the in-sync replica set for all partitions it replicates,
  • under-replicated partitions exist,
  • out-of-sync replicas exist,
  • offline partitions exist, and
  • the metadata of the cluster and the ZooKeeper metadata are consistent with each other.

Status

Build Status

Release version is 0.1.0

Compiled binaries are available for Linux, macOS, and FreeBSD.

Use Cases

Submit a pull request to have your use case listed here!

Self-healing cluster

At AutoScout24, in order to reduce operational workload, we use kafka-health-check to automatically restart broker nodes as they become unhealthy.

In-place rolling updates

At AutoScout24, to keep the OS up to date of our clusters running on AWS, we perform regular in-place rolling updates. As we run immutable servers, we terminate each broker and replace them with fresh EC2 instances (keeping the previous broker ids). In order not to jeopardy the cluster stability when terminating brokers, we verify that the cluster is healthy before taking one broker offline. Similarly, we wait for the broker coming back online to fully catch up before proceeding with the next broker. To achieve this, we use the cluster health information provided by kafka-health-check.

Usage

Usage of kafka-health-check:
  -broker-host string
    	ip address or hostname of broker host (default "localhost")
  -broker-id uint
    	id of the Kafka broker to health check
  -broker-port uint
    	Kafka broker port (default 9092)
  -check-interval duration
    	how frequently to perform health checks (default 10s)
  -no-topic-creation
    	disable automatic topic creation and deletion
  -replication-failures-count uint
    	number of replication failures before broker is reported unhealthy (default 5)
  -replication-topic string
    	name of the topic to use for replication checks - use one per cluster, defaults to broker-replication-check
  -server-port uint
    	port to open for http health status queries (default 8000)
  -topic string
    	name of the topic to use - use one per broker, defaults to broker-<id>-health-check
  -zookeeper string
    	ZooKeeper connect string (e.g. node1:2181,node2:2181,.../chroot)

Broker Health

Broker health can be queried at /:

$ curl -s <broker-host>:8000/
{
    "broker": 1,
    "status": "sync"
}

Return codes and status values are:

  • 200 with sync for a healthy broker that is fully in sync with all leaders.
  • 200 with imok for a healthy broker that replays messages of its health check topic, but is not fully in sync.
  • 500 with nook for an unhealthy broker that fails to replay messages in its health check topic within 200 milliseconds or if it fails to stay in the ISR of the replication check topic for more checks than replication-failures-count (default 5).

The returned json contains details about replicas the broker is lagging behind:

$ curl -s <broker-host>:8000/
{
    "broker": 3,
    "status": "imok",
    "out-of-sync": [
        {
            "topic": "mytopic",
            "partition": 0
        }
    ],
    "replication-failures": 1
}

Cluster Health

Cluster health can be queried at /cluster:

$ curl -s <broker-host>:8000/cluster
{
    "status": "green"
}

Return codes and status values are:

  • 200 with green if all replicas of all partitions of all topics are in sync and metadata is consistent.
  • 200 with yellow if one or more partitions are under-replicated and metadata is consistent.
  • 500 with red if one or more partitions are offline or metadata is inconsistent.

The returned json contains details about metadata status and partition replication:

$ curl -s <broker-host>:8000/cluster
{
    "status": "yellow",
    "topics": [
        {
            "topic": "mytopic",
            "status": "yellow",
            "partitions": {
                "1": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                },
                "2": {
                    "status": "yellow",
                    "OSR": [
                        3
                    ]
                }
            }
        }
    ]
}

The fields for additional info and structures are:

  • topics for topic replication status: [{"topic":"mytopic","status":"yellow","partitions":{"2":{"status":"yellow","OSR":[3]}}}] In this data, OSR means out-of-sync replica and contains the list of all brokers that are not in the ISR.
  • metadata for inconsistencies between ZooKeeper and Kafka metadata: [{"broker":3,"status":"red","problem":"Missing in ZooKeeper"}]
  • zookeeper for problems with ZooKeeper connection or data, contains a single string: "Fetching brokers failed: ..."

Supported Kafka Versions

Tested with the following Kafka versions:

  • 2.0.0
  • 1.1.1
  • 1.1.0
  • 1.0.0
  • 0.11.0.2
  • 0.11.0.1
  • 0.11.0.0
  • 0.10.2.1
  • 0.10.2.0
  • 0.10.1.1
  • 0.10.1.0
  • 0.10.0.1
  • 0.10.0.0
  • 0.9.0.1
  • 0.9.0.0

Kafka 0.8 is not supported.

see the compatibility spec for the full list of executed compatibility checks. To execute the compatibility checks, run make compatibility. Running the checks requires Docker.

Building

Run make to build after running make deps to restore the dependencies using govendor.

Prerequisites

Notable Details on Health Check Behavior

  • When first started, the check tries to find the Kafka broker to check in the cluster metadata. Then, it tries to find the health check topic, and creates it if missing by communicating directly with ZooKeeper(configuration: 10 seconds message lifetime, one single partition assigned to the broker to check). This behavior can be disabled by using -no-topic-creation.
  • The check also creates one replication check topic for the whole cluster. This topic is expanded to all brokers that are checked.
  • When shutting down, the check deletes to health check topic partition by communicating directly with ZooKeeper. It also shrinks the partition assignment of the replication check topic, and deletes it when stopping the last health check process. This behavior can be disabled by using -no-topic-creation.
  • The check will try to create the health check and replication check topics only on its first connection after startup. If the topic disappears later while the check is running, it will not try to re-create its topics.
  • If the broker health check fails, the cluster health will be set to red.
  • For each check pass, the Kafka cluster metadata is fetched from ZooKeeper, i.e. the full data on brokers and topic partitions with replicas.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].