All Projects → tkestack → Kvass

tkestack / Kvass

Licence: apache-2.0
Kvass is a Prometheus horizontal auto-scaling solution , which uses Sidecar to generate special config file only containes part of targets assigned from Coordinator for every Prometheus shard.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Kvass

Carbonapi
Implementation of graphite API (graphite-web) in golang
Stars: ✭ 243 (-25.23%)
Mutual labels:  monitoring, prometheus
Mtail
extract internal monitoring data from application logs for collection in a timeseries database
Stars: ✭ 3,028 (+831.69%)
Mutual labels:  monitoring, prometheus
Beam Dashboards
BEAM ❤️ Prometheus ❤️ Grafana
Stars: ✭ 244 (-24.92%)
Mutual labels:  monitoring, prometheus
Docker Traefik Prometheus
A Docker Swarm Stack for monitoring Traefik with Promethues and Grafana
Stars: ✭ 215 (-33.85%)
Mutual labels:  monitoring, prometheus
Ansible Prometheus
An Ansible role that installs Prometheus, in the format for Ansible Galaxy.
Stars: ✭ 256 (-21.23%)
Mutual labels:  monitoring, prometheus
Graphite exporter
Server that accepts metrics via the Graphite protocol and exports them as Prometheus metrics
Stars: ✭ 217 (-33.23%)
Mutual labels:  monitoring, prometheus
Example Prometheus Nodejs
Prometheus monitoring example with Node.js
Stars: ✭ 249 (-23.38%)
Mutual labels:  monitoring, prometheus
Wgcloud
linux运维监控工具,支持系统信息,内存,cpu,温度,磁盘空间及IO,硬盘smart,系统负载,网络流量等监控,API接口,大屏展示,拓扑图,进程监控,端口监控,docker监控,文件防篡改,日志监控,数据可视化,web ssh,堡垒机,指令下发批量执行,linux面板,探针,故障告警
Stars: ✭ 2,669 (+721.23%)
Mutual labels:  monitoring, prometheus
Exporterhub.io
A Curated List of Prometheus Exporters
Stars: ✭ 252 (-22.46%)
Mutual labels:  monitoring, prometheus
K8s
Important production-grade Kubernetes Ops Services
Stars: ✭ 253 (-22.15%)
Mutual labels:  monitoring, prometheus
Kube Metrics Adapter
General purpose metrics adapter for Kubernetes HPA metrics
Stars: ✭ 309 (-4.92%)
Mutual labels:  monitoring, prometheus
Prometheus.erl
Prometheus.io client in Erlang
Stars: ✭ 276 (-15.08%)
Mutual labels:  monitoring, prometheus
Oracledb exporter
Prometheus Oracle database exporter.
Stars: ✭ 209 (-35.69%)
Mutual labels:  monitoring, prometheus
Github Exporter
Prometheus exporter for github metrics
Stars: ✭ 231 (-28.92%)
Mutual labels:  monitoring, prometheus
Awesome Prometheus Alerts
🚨 Collection of Prometheus alerting rules
Stars: ✭ 3,323 (+922.46%)
Mutual labels:  monitoring, prometheus
Prometheus rabbitmq exporter
Prometheus.io exporter as a RabbitMQ Managment Plugin plugin
Stars: ✭ 248 (-23.69%)
Mutual labels:  monitoring, prometheus
Alertmanager2es
Receives HTTP webhook notifications from AlertManager and inserts them into an Elasticsearch index for searching and analysis
Stars: ✭ 173 (-46.77%)
Mutual labels:  monitoring, prometheus
Prometheus Nats Exporter
A Prometheus exporter for NATS metrics
Stars: ✭ 179 (-44.92%)
Mutual labels:  monitoring, prometheus
Netdata
Real-time performance monitoring, done right! https://www.netdata.cloud
Stars: ✭ 57,056 (+17455.69%)
Mutual labels:  monitoring, prometheus
Kube State Metrics
Add-on agent to generate and expose cluster-level metrics.
Stars: ✭ 3,433 (+956.31%)
Mutual labels:  monitoring, prometheus

中文版

Kvass is a Prometheus horizontal auto-scaling solution , which uses Sidecar to generate special config file only containes part of targets assigned from Coordinator for every Prometheus shard.

Coordinator do service discovery, Prometheus shards management and assign targets to each of shard. Thanos (or other storage solution) is used for global data view.

Go Report Card Build codecov


Table of Contents

Overview

Kvass is a Prometheus horizontal auto-scaling solution with following features.

  • Easy to use
  • Tens of millions series supported (thousands of k8s nodes)
  • One prometheus configuration file
  • Auto scaling
  • Sharding according to the actual target load instead of label hash
  • Multiple replicas supported

Design

image-20201126031456582

Components

Coordinator

See flags of Coordinator code

  • Coordinaotr loads origin config file and do all prometheus service discovery
  • For every active target, Coordinator do all "relabel_configs" and explore target series scale
  • Coordinaotr periodly try assgin explored targets to Sidecar according to Head Block Series of Prometheus.
image-20201126031409284

Sidecar

See flags of Sidecar code

  • Sidecar receive targets from Coordinator.Labels result of target after relabel process will also be send to Sidecar.

  • Sidecar generate a new Prometheus config file only use "static_configs" service discovery, and delete all "relabel_configs".

  • All Prometheus scraping request will be proxied to Sidecar for target series statistics.

    image-20201126032909776

Kvass + Thanos

Since the data of Prometheus now distribut on shards, we need a way to get global data view.

Thanos is a good choice. What we need to do is adding Kvass sidecar beside Thanos sidecar, and setting up a Kvass coordinator.

image-20201126035103180

Kvass + Remote storage

If you want to use remote storage like influxdb, just set "remote write" in origin Prometheus config.

Multiple replicas

Coordinator uses the label selector to select StatefulSets. Each StatefulSet is a replica. Management of targets between replicas is independent.

--shard.selector=app.kubernetes.io/name=prometheus

Targets transfer

There are scenarios where we need to move an assigned Target from one shard to another (for example, to de-pressure a shard).

In order to keep scraping normally, the Target transfer is divided into the following steps.

  • Mark the state of the Target in the original shard as IN_TRANSFER, and assign the Target to another shard with the state as Normal.
  • Wait for the Target to be scraped by both shards for at least 3 times.
  • Delete Target from the original shard.

Shard de-pressure

The series of Target products may increase over time, and the head series of shard may exceeding the threshold, such as the newly added K8S node, whose cadvisor data size may increase as POD is scheduled.

When the head series of a shard exceeds a certain proportion of the threshold, the Coordinator will attempt to de-pressurize the shard. That is, according to the proportion of the shard exceeding the threshold, the Coordinator will transfer some targets from the shard to other free shards. The higher the proportion of the shard exceeding the threshold, the more targets will be transferred.

Shard scaling down

Scaling down will only start at the largest shard.

When all targets on the highest-numbered shard can be migrated to other shards, a transfer attempt is made, which empties the highest-numbered shard.

When the shard is emptied, the shard becomes idle, and wiil be deleted after a certain period of time (waiting for the shard data to be deleted or uploaded to object storage).

You can set the idle time of Coordinaor using the following parameters and turn off shrinkage by set it to 0.

- --shard.max-idle-time=3h
- --shard.max-idle-time=0 // default

If StatefulSet is used to manage shards, you can add a parameter that will allow the Coordinator to automatically remove the PVC when the shard is removed

- --shard.delete-pvc=true // default

Limit shards number

The maximum and minimum number of shards for the Coordinator can be limited by setting the following flags. Note that if the minimum number of sharding is set, then you will only start Coordinate if the number of sharding available is at least the minimum number.

--shard.max-shard=99999 //default
--shard.min-shard=0 //default

Target scheduling strategy

If --shard.max-idle-time!=0, both new and migrated targets will be assigned to lower-numbered shards in preference.

If --shard.max-idle-time=0, it will be randomly allocated to the shard with space, which is especially useful with the --shard.min-shard flag.

Demo

There is a example to show how Kvass work.

git clone https://github.com/tkestack/kvass

cd kvass/example

kubectl create -f ./examples

you can found a Deployment named "metrics" with 6 Pod, each Pod will generate 10045 series (45 series from golang default metrics) metircs。

we will scrape metrics from them。

image-20200916185943754

the max series each Prometheus Shard can scrape is a flag of Coordinator Pod.

in the example case we set to 30000.

--shard.max-series=30000

now we have 6 target with 60000+ series and each Shard can scrape 30000 series,so need 3 Shard to cover all targets.

Coordinator automaticly change replicate of Prometheus Statefulset to 3 and assign targets to them.

image-20200916190143119

only 20000+ series in prometheus_tsdb_head of one Shard

image-20200917112924277

but we can get global data view use thanos-query

image-20200917112711674

Best practice

Flag values suggestion

The memory useage of every Prometheus is associated with the max head series.

The recommended "max series" is 750000, set Coordinator flag

--shard.max-series=750000

The memory request of Prometheu with 750000 max series is 8G.

License

Apache License 2.0, see LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].