All Projects → rystsov → perseus

rystsov / perseus

Licence: MIT license
Perseus is a set of scripts (docker+javascript) to investigate a distributed database's responsiveness when one of its three nodes is isolated from the peers

Programming Languages

javascript
184084 projects - #8 most used programming language
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to perseus

Nestcloud
A NodeJS micro-service solution, writing by Typescript language and NestJS framework.
Stars: ✭ 290 (+491.84%)
Mutual labels:  consul, etcd
Hyperf
🚀 A coroutine framework that focuses on hyperspeed and flexibility. Building microservice or middleware with ease.
Stars: ✭ 4,206 (+8483.67%)
Mutual labels:  consul, etcd
Stolon
PostgreSQL cloud native High Availability and more.
Stars: ✭ 3,481 (+7004.08%)
Mutual labels:  consul, etcd
Gokv
Simple key-value store abstraction and implementations for Go (Redis, Consul, etcd, bbolt, BadgerDB, LevelDB, Memcached, DynamoDB, S3, PostgreSQL, MongoDB, CockroachDB and many more)
Stars: ✭ 314 (+540.82%)
Mutual labels:  consul, etcd
Vip Manager
Manages a virtual IP based on state kept in etcd or Consul
Stars: ✭ 75 (+53.06%)
Mutual labels:  consul, etcd
Burry.sh
Cloud Native Infrastructure BackUp & RecoveRY
Stars: ✭ 260 (+430.61%)
Mutual labels:  consul, etcd
Remco
remco is a lightweight configuration management tool
Stars: ✭ 200 (+308.16%)
Mutual labels:  consul, etcd
seagull
Configuration server submodule for all SeaSerives
Stars: ✭ 19 (-61.22%)
Mutual labels:  consul, etcd
Dister
dister(Distribution Cluster)是一款轻量级高性能的分布式集群管理软件,实现了分布式软件架构中的常用核心组件,包括:服务配置管理中心、服务注册与发现、服务健康检查、服务负载均衡。dister的灵感来源于ZooKeeper、Consul、Etcd,它们都实现了类似的分布式组件,但是dister更加的轻量级、低成本、易维护、架构清晰、简单实用、性能高效,这也是dister设计的初衷。
Stars: ✭ 41 (-16.33%)
Mutual labels:  consul, etcd
Traefik
The Cloud Native Application Proxy
Stars: ✭ 36,089 (+73551.02%)
Mutual labels:  consul, etcd
Dbtester
Distributed database benchmark tester
Stars: ✭ 214 (+336.73%)
Mutual labels:  consul, etcd
Go Oauth2 Server
A standalone, specification-compliant, OAuth2 server written in Golang.
Stars: ✭ 1,843 (+3661.22%)
Mutual labels:  consul, etcd
Patroni
A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
Stars: ✭ 4,434 (+8948.98%)
Mutual labels:  consul, etcd
Learning Tools
A collection of tools and files for learning new technologies
Stars: ✭ 1,287 (+2526.53%)
Mutual labels:  consul, etcd
Docker Compose
一些基础服务的docker-compose配置文件,方便在一台新电脑上快速开始工作
Stars: ✭ 163 (+232.65%)
Mutual labels:  consul, etcd
book-library
📚 A book library app for both Android & IOS ~ Flutter.dev project in Dart
Stars: ✭ 89 (+81.63%)
Mutual labels:  tests
Core Grpc
C# Grpc驱动封装,基于Consul实现服务注册服务发现,支持dotnetcore / framework,可快速实现基于Grpc的微服务,内部有完整案例,包含服务端Server 客户端 Client,core+grpc, netcore+grpc, dotnetcore+grpc
Stars: ✭ 209 (+326.53%)
Mutual labels:  consul
Springcloudexamples
Spring Cloud 学习教程
Stars: ✭ 208 (+324.49%)
Mutual labels:  consul
minietcd
☁️ Super small and "dumb" read-only client in Go for coreos/etcd (v2).
Stars: ✭ 12 (-75.51%)
Mutual labels:  etcd
Springcloudlearning
《史上最简单的Spring Cloud教程源码》
Stars: ✭ 16,218 (+32997.96%)
Mutual labels:  consul

Perseus is a set of scripts to investigate a distributed database's responsiveness when one of its three nodes is isolated from the peers

Database Downtime on isolation Partial downtime on isolation Disturbance time Downtime on recovery Partial downtime on recovery Recovery time Version
Etcd 1s 0s 1s 1s 0s 2s 3.2.13
Gryadka 0s 0s 0s 0s 0s 5s gryadka: 1.61.8
redis: 4.0.1
CockroachDB 7s 19s 26s 7s 0s 13s 1.1.3
Consul 14s 1s 15s 8s 0s 10s 1.0.2
RethinkDB 17s 0s 17s 0s 0s 21s 2.3.6
Riak 8s >121s N/A 1s 6s 18s 2.2.3
MongoDB (1) 29s 0s 29s 0s 0s 1s 3.6.1
MongoDB (2) 117s 0s 117s 0s 0s N/A 3.6.1
MongoDB (3) 29s 0s 29s 0s 0s N/A 3.6.1
TiDB (1) 15s 1s 16s 82s 8s 114s PD: 1.1.0
KV: 1.0.1
DB: 1.1.0
TiDB (2) >235s 0 N/A >89s 0 N/A same
YugaByte >366s 0 N/A 51s 0 51s 0.9.1.0

Downtime on isolation: Complete unavailability (all three nodes are unavailable to write/read) on a transition from steady three nodes to steady two nodes caused by isolation of the third

Partial downtime on isolation Only one node is available on the 3-to-2 transition

Disturbance time Time bewteen the isolation and two nodes became steady

Downtime on recovery: Complete unavailability on the transition from steady two nodes to steady three nodes caused by rejoining of the missing node

Partial downtime on recovery: Only one node is available on the 2-to-3 transition

Recovery time: Time between connectivity is restored and all three nodes are available to write and read

What were tested?

All the testing systems have something similar. They are distributed consistent databases (key-value storages) which tolerates up to n failure of 2n+1 nodes.

This testing suite uses a three nodes configuration with a fourth node acting as a client. The client spawns three threads (coroutines). Each thread opens a connection to one of the three DB's node and in loop reads, increments and writes a value back.

Once in a second it dumps an aggregated statistic in the following form:

#legend: time|gryadka1|gryadka2|gryadka3|gryadka1:err|gryadka2:err|gryadka3:err
1	128	175	166	0	0	0	2018/01/16 09:02:41
2	288	337	386	0	0	0	2018/01/16 09:02:42
...
18	419	490	439	0	0	0	2018/01/16 09:02:58
19	447	465	511	0	0	0	2018/01/16 09:02:59

The first column is the number of seconds since the beginning of the experiment; the following three columns represent the number of increments per each node of the cluster per second, the next triplet is the number of errors per second, and the last one is time.

In case of MongoDB and Gryadka you can't control to which node a client connects and need to specify addresses of all nodes in a connection string, so each DB's replica has a colocated client (read-modify-write loop) while the fourth node was only responsible for aggregation of statistics and dumping it every second.

During a test, I isolated one of the DB's nodes from it peers and observed how it affected the outcome. Thanks to Docker Compose the tests are reproducible just with a couple of commands, navigate to a DB's subfolder to see instructions.

So, this is just a bunch of numbers based on the default parameters such as leader election timeout of various systems, isn't it?

Kind of, but there are anyway lot of interesting patterns behind the numbers.

Some systems have downtime on recovery, and some don't, TiDB has partial downtime on recovery and others don't. Gryadka doesn't even have downtime on isolation, so testing "defaults" helps to understand the fundamental properties of the replication algorithms each system uses.

Why there are three records for MongoDB?

I didn't manage to achieve stable results with MongoDB. Sometimes it behaved like any other leader-based system: isolation of a leader led to a temporary cluster-wide downtime and then to two steady working node, once a connection was restored all three nodes were working as usual. Sometimes it had issues:

  1. The downtime lasted almost two minutes and once it finished the RPS dropped from 282 before the isolation starter to less than one.
  2. A client on an isolated node couldn't connect to a cluster even after a connection was restored.

I fired a bug for every issue 1 and 2.

Why there are two records for TiDB?

When TiDB starts it works in a "warm-up" mode, however from a client perspective, it's indistinguishable from "normal" mode, so there are two records.

If a node became isolated during the "warm-up" mode, then the whole cluster became unavailable. Moreover, it doesn't recover even when a connection is restored. I fired a corresponding bug: pingcap/tidb#2676.

N.B. When TiDB is a "warm-up" mode, the replication factor is one, so it's possible to lose acknowledged data in case of death of a single node.

What's wrong with YugaByte?

yugabyte/yugabyte-db#19

What's Gryadka?

Gryadka is an experimental key/value storage. I created it as a demonstration that Paxos isn't hard as its reputation. Be careful! It isn't production ready, the best way to use it is to read sources (less than 500 lines of code) and write your implementation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].