All Projects → MasahikoSawada → pg_keeper

MasahikoSawada / pg_keeper

Licence: other
Simplified clustering module for PostgreSQL

Programming Languages

c
50402 projects - #5 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to pg keeper

Postgresql cluster
PostgreSQL High-Availability Cluster (based on "Patroni" and "DCS(etcd)"). Automating deployment with Ansible.
Stars: ✭ 294 (+818.75%)
Mutual labels:  replication, failover, high-availability
Nsfailover
Let's Make DNS Outage Suck Less
Stars: ✭ 95 (+196.88%)
Mutual labels:  failover, high-availability
awesome-storage
A curated list of storage open source tools. Backups, redundancy, sharing, distribution, encryption, etc.
Stars: ✭ 324 (+912.5%)
Mutual labels:  replication, high-availability
Orchestrator
MySQL replication topology management and HA
Stars: ✭ 4,342 (+13468.75%)
Mutual labels:  replication, high-availability
Patroni
A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
Stars: ✭ 4,434 (+13756.25%)
Mutual labels:  failover, high-availability
Bond Cni
Bond-cni is for fail-over and high availability of networking in cloudnative orchestration
Stars: ✭ 36 (+12.5%)
Mutual labels:  failover, high-availability
Glusterfs
Gluster Filesystem : Build your distributed storage in minutes
Stars: ✭ 3,437 (+10640.63%)
Mutual labels:  replication, high-availability
Repmgr
A lightweight replication manager for PostgreSQL (Postgres) - latest version 5.2.1 (2020-12-07)
Stars: ✭ 1,207 (+3671.88%)
Mutual labels:  replication, failover
Lizardfs
LizardFS is an Open Source Distributed File System licensed under GPLv3.
Stars: ✭ 793 (+2378.13%)
Mutual labels:  replication, high-availability
Freno
freno: cooperative, highly available throttler service
Stars: ✭ 462 (+1343.75%)
Mutual labels:  replication, high-availability
Paf
PostgreSQL Automatic Failover: High-Availibility for Postgres, based on Pacemaker and Corosync.
Stars: ✭ 288 (+800%)
Mutual labels:  failover, high-availability
Recon
HA LDAP based key/value solution for projects configuration storing with multi master replication support
Stars: ✭ 12 (-62.5%)
Mutual labels:  replication, high-availability
Appactive
AppActive: A middleware to build an application with multiple active architectures(阿里巴巴开源的一款标准通用且功能强大的构建应用多活架构的开源中间件)
Stars: ✭ 241 (+653.13%)
Mutual labels:  failover, high-availability
Ansible Role Patroni
🐘 Ansible Role for Patroni
Stars: ✭ 40 (+25%)
Mutual labels:  failover, high-availability
Gauntlet
🔖 Guides, Articles, Podcasts, Videos and Notes to Build Reliable Large-Scale Distributed Systems.
Stars: ✭ 336 (+950%)
Mutual labels:  failover, high-availability
Replication Manager
Signal 18 repman - Replication Manager for MySQL / MariaDB / Percona Server
Stars: ✭ 461 (+1340.63%)
Mutual labels:  replication, failover
Drc
MySQL active-active replication solution.
Stars: ✭ 92 (+187.5%)
Mutual labels:  replication, high-availability
pg-dock
pg-dock cluster managment
Stars: ✭ 19 (-40.62%)
Mutual labels:  failover, high-availability
mysql-replication
mysql replication binlog php 同步工具
Stars: ✭ 60 (+87.5%)
Mutual labels:  replication
slock
High-performance distributed sync service and atomic DB
Stars: ✭ 50 (+56.25%)
Mutual labels:  replication

pg_keeper 1.0

pg_keeper is a simplified clustering module for PostgreSQL, to promote a standby server to master in a 2 servers cluster.

The license of pg_keeper is PostgreSQL License. (same as BSD License)

Prerequisite

pg_keeper requires a master and hot standby servers in PostgreSQL 9.3 or later, on a Linux OS. pg_keeper requires to have already the replication in place.

Overview

pg_keeper runs on both master and standby server as two modes; master mode and standby mode. The pg_keeper mode is determined automatially by itself.

  • master mode

master mode of pg_keeper queries the standby server at fixed intervals using a simple query 'SELECT 1'. If pg_keeper fails to get any result after a certain number of tries, pg_keeper will change replication mode to asynchronous replication so that backend process can avoid to wait infinity.

  • standby mode

standby mode of pg_keeper queries the primary server at fixed intervals using a simple query 'SELECT 1'. If pg_keeper fails to get any result after a certain number of tries, pg_keeper will promote the standby it runs on to master. After promoting to master server, pg_keeper switches from standby mode to master mode automatically.

With this, fail over time can be calculated with this formula.

(F/O time) = pg_keeper.keepalives_time * pg_keeper.keepalives_count

GUC parameters

Note that the parameters with (*) are mandatory options.

  • pg_keeper.partner_conninfo(*)

    • Specifies a connection string to be used for heart-beat to the partner node.
    • The heart-beat LAN is better to be separated from replication LAN.
    • Also, the NIC for heart-beat LAN is better to use NIC bonding.
  • pg_keeper.my_conninfo(*)

    • Specifies a connection string to be used for pg_keeper to do ALTER SYSTEM on myself.
  • pg_keeper.keepalive_time (sec)

    • Specifies how long interval pg_keeper continues polling. 5 seconds by default.
  • pg_keeper.keepalive_count

    • Specifies how many times pg_keeper try polling to master server in order to promote standby server. 4 times by default.
  • pg_keeper.after_command

    • Specifies shell command that will be called after promoted. Setting stonith command to this parameter is useful for preventing the split-brain syndrome.

Tested platforms

pg_keeper has been built and tested on following platforms:

Category Module Name
OS CentOS 6.5
PostgreSQL 9.5, 9.6

pg_keeper probably can work with PostgreSQL 9.3 or later, but not tested yet.

Reporting of building or testing pg_keeper on some platforms are very welcome.

How to set up pg_keeper

Installation

pg_keeper needs to be installed into both master server and standby server.

$ cd pg_keeper
$ make USE_PGXS=1
$ su
# make USE_PGXS=1 install

Configuration

For example, we set up two servers; pgserver1 and pgserver2. pgserver1 is the first master server and pgserver2 is the first standby server. After created user for replication and set up authentication, we need to install pg_keeper in both servers and configure some parameters as follows.

-- on pgserver1 (first master server)
$ vi $PGDATA/postgresql.conf
wal_level = hot_standby
max_worker_processes = 8 # pg_keeper requires one worker on each side
max_wal_senders = 8 # must be more than 1
hot_standby = on
shared_preload_libraries = 'pg_keeper'
synchronous_standby_names = 'pgserver2' # must use set sync replication mode.
pg_keeper.keepalive_time = 5
pg_keeper.keepalive_count = 3
pg_keeper.my_conninfo = 'host=127.0.0.1 port=5432 dbname=postgres'
pg_keeper.partner_conninfo = 'host=pgserver2 port=5432 dbname=postgres'
-- on pgserver2 (first slave server)
$ vi $PGDATA/postgresql.conf
wal_level = hot_standby
max_worker_processes = 8 # pg_keeper requires one worker on each side
max_wal_senders = 8 # must be more than 1
hot_standby = on
shared_preload_libraries = 'pg_keeper'
pg_keeper.keepalive_time = 5
pg_keeper.keepalive_count = 3
pg_keeper.my_conninfo = 'host=127.0.0.1 port=5432 dbname=postgres'
pg_keeper.partner_conninfo = 'host=pgserver1 port=5432 dbname=postgres'
$ vi $PGDATA/recovery.conf
standby_mode = 'on'
recovery_target_timeline = latest
primary_conninfo = 'host=pgserver1 port=5432 user=repl_user application_name=pgserver2'

Starting servers

We should start master server first that pg_keeper is installed in. master server's pg_keeper process will be launched when master server got started, once pg_keeper in standby server connected master's pg_keeper process it will start to work.

After started both master server and slave server, make sure that pg_keeper process is launched successfully in appropriate state on both server.

-- On master server
$ tail master.log
LOG:  pg_keeper connects to standby server
$ ps x | grep pg_keeper | grep -v grep
33525 ?        Ss     0:00 postgres: bgworker: pg_keeper   (master mode:connected)
-- On standby server
$ ps x | grep pg_keeper | grep -v grep
33613 ?        Ss     0:00 postgres: bgworker: pg_keeper   (standby mode:connected)

For more detail of state transition of pg_keeper, please refer State Transition of pg_keeper section.

Handling standby server failure (Automated changing sync replication to async replication)

In case the synchronous standby server crashes, because the master server cannnot replicate data to synchronous standby server the following transaction can not be processed. In this case, pg_keeper on the master server changes synchronous replication to asynchronous replication by changing synchronous_standby_names GUC parameter after detected the standby server failure if synchronous replication is enabled. You can see following server log on the master server.

$ cat master.log
<2016-07-20 09:10:09.855 AST> LOG:  could not get tuple from server : "host=pgserver2 port=5432 dbname=postgres"
<2016-07-20 09:10:09.855 AST> LOG:  pg_keeper failed to connect 1 time(s)
<2016-07-20 09:10:14.859 AST> LOG:  could not get tuple from server : "host=pgserver2 port=5432 dbname=postgres"
<2016-07-20 09:10:14.859 AST> LOG:  pg_keeper failed to connect 2 time(s)
<2016-07-20 09:10:24.867 AST> LOG:  pg_keeper changes replication mode to asynchronous replication
<2016-07-20 09:10:24.884 AST> LOG:  received SIGHUP, reloading configuration files
<2016-07-20 09:10:24.885 AST> LOG:  parameter "synchronous_standby_names" changed to ""

After the standby server recovered, you need to set synchronous_standby_names parameter on the primary server manually in order to set up streaming replication again.

Handling master server failure (Automated failover)

In case the master server crashes, the standby server needs to promote to new master server. pg_keeper on the standby server promote it after detecting the master server failure. You can see following server log on the standby server.

$ tail standby.log
<2016-07-20 09:14:30.622 AST>LOG:  could not get tuple from server : "host=pgserver1 port=5432 dbname=postgres"
<2016-07-20 09:14:30.622 AST>LOG:  pg_keeper failed to connect 1 time(s)
<2016-07-20 09:14:35.628 AST>LOG:  could not get tuple from server : "host=pgserver1 port=5432 dbname=postgres"
<2016-07-20 09:14:35.628 AST>LOG:  pg_keeper failed to connect 2 time(s)
<2016-07-20 09:14:45.641 AST>LOG:  pg_keeper promoted standby server to primary server
<2016-07-20 09:14:45.641 AST>LOG:  received promote request
<2016-07-20 09:14:45.641 AST>LOG:  redo done at 0/3000060
<2016-07-20 09:14:45.646 AST>LOG:  selected new timeline ID: 2
<2016-07-20 09:14:45.690 AST>LOG:  archive recovery complete
<2016-07-20 09:14:45.692 AST>LOG:  MultiXact member wraparound protections are now enabled
<2016-07-20 09:14:45.693 AST>LOG:  database system is ready to accept connections

State Transition

state description
(standby:ready) Waiting for being able to connect to the master server.
(standby:connected) Connected to the master server. Heartbeating.
(master:ready) Wait for replication connection from standby server.
(master:connected) Connected from the standby server. Heartbeating.
(master:async) The master server is running as async replication mode.

Uninstallation

  • Following commands need to be executed in both master server and standby server.
$ cd pg_keeper
$ make USE_PGXS=1
$ su
# make USE_PGXS=1 uninstall
  • Remove pg_keeper from shared_preload_libraries in postgresql.conf on both servers.
$ vi postgresql.conf
shared_preload_libraries = ''
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].