All Projects → toni-moreno → syncflux

toni-moreno / syncflux

Licence: MIT license
SyncFlux is an Open Source InfluxDB Data synchronization and replication tool for migration purposes or HA clusters

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to syncflux

Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+606.9%)
Mutual labels:  clustering, high-availability
WatsonCluster
A simple C# class using Watson TCP to enable a one-to-one high availability cluster.
Stars: ✭ 18 (-87.59%)
Mutual labels:  clustering, high-availability
influx-proxy
InfluxDB Proxy with High Availability and Consistent Hash
Stars: ✭ 223 (+53.79%)
Mutual labels:  influxdb, high-availability
sre.surmon.me
💻 SRE service for Surmon.me blog.
Stars: ✭ 34 (-76.55%)
Mutual labels:  backup-tool, backup-database
Hawk
A web-based GUI for managing and monitoring the Pacemaker High-Availability cluster resource manager
Stars: ✭ 130 (-10.34%)
Mutual labels:  clustering, high-availability
influxdb-ha
High-availability and horizontal scalability for InfluxDB
Stars: ✭ 45 (-68.97%)
Mutual labels:  influxdb, clustering
CosmicClone
Cosmic Clone is a utility that can backup\clone\restore a azure Cosmos database Collection. It can also anonymize cosmos documents and helps hide personally identifiable data.
Stars: ✭ 113 (-22.07%)
Mutual labels:  backup-tool, backup-database
nifi-influxdb-bundle
InfluxDB Processors For Apache NiFi
Stars: ✭ 30 (-79.31%)
Mutual labels:  influxdb
influxable
A lightweight python ORM / ODM / Client for InfluxDB
Stars: ✭ 36 (-75.17%)
Mutual labels:  influxdb
fanuc-driver
Configurable Fanuc Focas data collector and post processor.
Stars: ✭ 38 (-73.79%)
Mutual labels:  influxdb
snATAC
<<------ Use SnapATAC!!
Stars: ✭ 23 (-84.14%)
Mutual labels:  clustering
ssdc
ssdeep cluster analysis for malware files
Stars: ✭ 24 (-83.45%)
Mutual labels:  clustering
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (-70.34%)
Mutual labels:  clustering
unpoller
Application: Collect ALL UniFi Controller, Site, Device & Client Data - Export to InfluxDB or Prometheus
Stars: ✭ 1,613 (+1012.41%)
Mutual labels:  influxdb
clustering-python
Different clustering approaches applied on different problemsets
Stars: ✭ 36 (-75.17%)
Mutual labels:  clustering
InfluxDB
App Metrics Extensions for InfluxDB reporting
Stars: ✭ 17 (-88.28%)
Mutual labels:  influxdb
nest-convoy
[WIP] An opinionated framework for building distributed domain driven systems using microservices architecture
Stars: ✭ 20 (-86.21%)
Mutual labels:  high-availability
NNM
The PyTorch official implementation of the CVPR2021 Poster Paper NNM: Nearest Neighbor Matching for Deep Clustering.
Stars: ✭ 46 (-68.28%)
Mutual labels:  clustering
RcppML
Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
Stars: ✭ 52 (-64.14%)
Mutual labels:  clustering
netdata-influx
Netdata ➡️ InfluxDB metrics exporter & Grafana dashboard
Stars: ✭ 29 (-80%)
Mutual labels:  influxdb

SyncFlux

SyncFlux is an Open Source InfluxDB Data syncronization and replication tool with HTTP API Interface which has as main goal recover lost data from any handmade HA influxDB 1.X cluster ( made as any simple relay https://github.com/influxdata/influxdb-relay or our Smart Relay http://github.com/toni-moreno/influxdb-srelay )

Intall from precompiled packages

Debian RedHat Docker
deb - signature rpm - signature docker run -d --name=syncflux_instance00 -p 4090:4090 -v /mylocal/conf:/opt/syncflux/conf -v /mylocal/log:/opt/syncflux/log tonimoreno/syncflux

All releases here.

releases

Run from master

If you want to build a package yourself, or contribute. Here is a guide for how to do that.

Dependencies

  • Go 1.11

Get Code

go get -d github.com/toni-moreno/syncflux/...

Building the backend

cd $GOPATH/src/github.com/toni-moreno/syncflux
go run build.go build           

Creating minimal package tar.gz

After building frontend and backend you will do

go run build.go pkg-min-tar

Creating rpm and deb packages

you will need previously installed the fpm/rpm and deb packaging tools. After building frontend and backend you will do.

go run build.go latest

Running first time

To execute without any configuration you need a minimal config.toml file on the conf directory.

cp conf/sample.syncflux.toml conf/syncflux.toml
./bin/syncflux [options]

Creating and running docker image

make -f Makefile.docker
docker run tonimoreno/syncflux:latest -version
docker run  tonimoreno/syncflux:latest -h
docker run  -p 4090:4090 -v /mylocal/conf:/opt/syncflux/conf -v /mylocal/log:/opt/syncflux/log tonimoreno/syncflux:latest [options]

Recompile backend on source change (only for developers)

To rebuild on source change (requires that you executed godep restore)

go get github.com/Unknwon/bra
bra run  

will init a change autodetect webserver with angular-cli (ng serve) and also a autodetect and recompile process with bra for the backend

Basic Usage

Execution parameters

Usage of ./bin/syncflux:
   -action: hamonitor(default),copy,fullcopy,replicaschema
    -chunk: set RW chuck periods as in the data-chuck-duration config param
   -config: config file
-copyorder: backward (most to least recent, default), forward (least to most recent)
       -db: set the db where to play
      -end: set the endtime do action (no valid in hamonitor) default now
     -full: copy full database or now()- max-retention-interval if greater retention policy
  -logmode: log mode [console/file] default console
     -logs: log directory (only apply if action=hamonitor and logmode=file)
   -master: choose master ID from all those in the config file where to get data (override the master-db parameter in the config file)
     -meas: set the meas where to play
    -newdb: set the db to work on
    -newrp: set the rp to work on
  -pidfile: path to pid file
       -rp: set the rp where to play
    -slave: choose master ID from all those in the config file where to write data (override the slave-db parameter in the config file)
    -start: set the starttime to do action (no valid in hamonitor) default now-24h
        -v: set log level to Info
  -version: display the version
       -vv: set log level to Debug
      -vvv: set log level to Trace

Set config file

# -*- toml -*-

# -------GENERAL SECTION ---------
# syncflux could work in several ways, 
# not all General config parameters works on all modes.
#  modes
#  "hamonitor" => enables syncflux as a daemon to sync 
#                2 Influx 1.X OSS db and sync data between them
#                when needed (does active monitoring )
#  "copy" => executes syncflux as a new process to copy data 
#            between master and slave databases
#  "replicashema" => executes syncflux as a new process to create 
#             the database/s and all its related retention policies 
#  "fullcopy" => does database/rp replication and after does a data copy

[General]
 # ------------------------
 # logdir ( only valid on hamonitor action) 
 #  the directory where to place logs 
 #  will place the main log "
 #  

 logdir = "./log"

 # ------------------------
 # loglevel ( valid for all actions ) 
 #  set the log level , valid values are:
 #  fatal,error,warn,info,debug,trace

 loglevel = "debug"

 # -----------------------------
 # sync-mode (only valid on hamonitor action)
 #  NOTE: rigth now only  "onlyslave" (one way sync ) is valied
 #  (planned sync in two ways in the future)

 sync-mode = "onlyslave"

 # ---------------------------
 # master-db choose one of the configured InfluxDB as a SlaveDB
 # this parameter will be override by the command line -master parameter
 
 master-db = "influxdb01"

 # ---------------------------
 # slave-db choose one of the configured InfluxDB as a SlaveDB
 # this parameter will be override by the command line -slave parameter
 
 slave-db = "influxdb02"

 # ------------------------------
 # check-interval
 # the inteval for health cheking for both master and slave databases
 
 check-interval = "10s"

 # ------------------------------
 # min-sync-interval
 # the inteval in which HA monitor will check both are ok and change
 # the state of the cluster if not, making all needed recovery actions

 min-sync-interval = "20s"
 
 # ---------------------------------------------
 # initial-replication
 # tells syncflux if needed some type of replication 
 # on slave database from master database on initialize 
 # (only valid on hamonitor action)
 #
 # none:  no replication
 # schema: database and retention policies will be recreated on the slave database
 # data: data for all retention policies will be replicated 
 #      be carefull: this full data copy could take hours,days.
 # both:  will replicate first the schema and them the full data 

 initial-replication = "none"

 # 
 # monitor-retry-durtion 
 #
 # syncflux only can begin work when master and slave database are both up, 
 # if some of them is down syncflux will retry infinitely each monitor-retry-duration to work.
 monitor-retry-interval = "1m"

 # 
 # data-chuck-duration
 #
 # duration for each small, read  from master -> write to slave, chuck of data
 # smaller chunks of data will use less memory on the syncflux process
 # and also less resources on both master and slave databases
 # greater chunks of data will improve sync speed 

 data-chuck-duration = "60m"

 # 
 #  max-retention-interval
 #
 # for infinite ( or bigger ) retention policies full replication should begin somewhere in the time
 # this parameter set the max retention.
 
 max-retention-interval = "8760h" # 1 year
 

# ---- HTTP API SECTION (Only valid on hamonitor action)
# Enables an HTTP API endpoint to check the cluster health

[http]
 name = "example-http-influxdb"
 bind-addr = "127.0.0.1:4090"
 admin-user = "admin"
 admin-passwd = "admin"
 cookie-id = "mysupercokie"

# ---- INFLUXDB  SECTION
# Sets a list of available DB's that can be used 

Run as a Database replication Tool

Available actions:

  • Replicate Schema
  • Copy data
  • Full copy (replicate schema + copy data)

Replicate schema

Allows the user to copy DB schemas from DB1 to DB2. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action replicaschema [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>]

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags

Limitations

  • Only the default RP can be renamed

Important Notes

When copying big databases, there is a few things you shoult take care, to ensure data is corretly copied.

Syncflux tool copy data by doing "select * from XXXXX where time > [INIT_CHUNK] AND time > [END_CHUNK]" for each one of the existing measurements in the choosen database, It does num queries concurrently Depending on the measurement cardinality these queries could take long time (be carefull with timeouts) and also need for resources (memory mainly ) in both databases , but also in for the syncflux process itself.

We recomends increase/disable all query timeouts:

Examples

Example 1: Copy schema from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy schema from Influx01-DB1 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from rp1

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -rp "^rp1$"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db3
    |-- rp1*

Example 4: Copy schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and set the defaultrp to rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "replicaschema" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -newrp "rp3"

The result will be that the schema of Influx01 will be replicated on Influx02

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Copy data

Allows the user to copy DB data from master to slave. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action copy [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>] { [-start <start_time>] [-endtime <end_time>] , [-full] }

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags The start end end allow to define a time window to copy data. If full is passed, the data will be copied from now to max-retention-interval

Remember that with this action schema is not replicated so if the DB or RP on slave doesn't exists it will be skipped

Limitations

  • ...

Examples

Example 1: Copy all data from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "coy" -master "influx01" -slave "influx02"

The command above will copy data from all dbs from Influxdb01 into Influx02

Influx02 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy data from Influx01-DB1 to Influx02 on a time window and only from rp1

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -rp "^rp1$" -start -10h end -5h

The command above will repicate all data from Influx01 to InfluxDB but only from db1.rp1 and with a time window from -10h to -5h

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy data from Influx01-DB1 to Influx02-DB3 (existing db called DB3)

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3'

Influx02 schema
----------------
  |-- db3
    |-- rp1*
    |-- rp2

Example 4: Copy data from Influx01-DB1 to Influx02-DB3 (existing db called DB3) and set the defaultrp to existing rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -mes "cpu.*"

The command above will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
      |-- cpu
    |-- rp2

Copy data + schema

Allows the user to copy DB data from master to slave. DB schema are DBs and RPs.

Syntax

./bin/syncflux -action fullcopy [-master <master_id>] [-slave <slave_id>] [-db <db_regex_selector>] [-newdb <newdb_name>] [-rp <rp_regex_selector>] [-newrp <newrp_name>] [-meas <meas_regex_selector>] { [-start <start_time>] [-endtime <end_time>] , [-full] }

Description of syntax

If no master or slave are provided it takes the default from config file. The db selector allows to filter with regex expression on all dbs. If the slave schema must be different than the master, the new schema can be set using newdb and newrp flags The start end end allow to define a time window to copy data. If full is passed, the data will be copied from now to max-retention-interval

Remember that with this action schema is not replicated so if the DB or RP on slave doesn't exists it will be skipped

Limitations

  • Only the default RP can be renamed

Examples

Example 1: Copy all data and schema from Influx01 to Influx02

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "coy" -master "influx01" -slave "influx02"

The command above will create the schema and will copy data from all dbs from Influxdb01 into Influx02

Influx02 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2

Example 2: Copy data and schema from Influx01-DB1 to Influx02 on a time window

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -start -10h end -5h

The command above will create the schema and will repicate all data from Influx01 to InfluxDB but only from db1 and with a time window from -10h to -5h

Influx02 schema
----------------
  |-- db1
    |-- rp1*
    |-- rp2

Example 3: Copy data from Influx01-DB1 to Influx02-DB3 (new db called DB3)

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3'

Influx02 schema
----------------
  |-- db3
    |-- rp1*
    |-- rp2

Example 4: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and set the defaultrp to rp3

Influx01 schema
----------------

  |-- db1
    |-- rp1*
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
    |-- rp2

Example 5: Copy data and schema from Influx01-DB1 to Influx02-DB3 (new db called DB3) and only from meas "cpu.*"

Influx01 schema
----------------

  |-- db1
    |-- rp1*
      |-- cpu
      |-- mem
      |-- swap
      |-- ...
    |-- rp2
  |-- db2
    |-- rp1*
    |-- rp2
./bin/syncflux -action "copy" -master "influx01" -slave "influx02" -db "^db1$" -newdb "db3" -mes "cpu.*"

The command above will create the schema and will replicate all data from Influx01-db1 to InfluxDB on a new DB called 'db3' and a new defaultrp called rp3

Influx02 schema
----------------
  |-- db3
    |-- rp3*
      |-- cpu
    |-- rp2

Run as a HA Cluster monitor

./bin/syncflux -config ./conf/syncflux.conf -action hamonitor 

syncflux by default search a file syncflux.conf in the CWD/conf/ and syncflux has hamonitor action by default so this last is equivalent to this one

./bin/syncflux  

you can check the cluster state with any HTTP client, posibles values are:

  • OK: both nodes are ok
  • CHECK_SLAVE_DOWN: current slave is down
  • RECOVERING: both databases are working but slave leaks some data and syncflux is recovering them
 % curl http://localhost:4090/api/health
{
  "ClusterState": "CHECK_SLAVE_DOWN",
  "ClusterNumRecovers": 0,
  "ClusterLastRecoverDuration": 0,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T09:45:05.461897766+02:00",
  "SlaveState": false,
  "SlaveLastOK": "2019-04-06T09:44:55.465393243+02:00"
}

% curl http://localhost:4090/api/health
{
  "ClusterState": "RECOVERING",
  "ClusterNumRecovers": 0,
  "ClusterLastRecoverDuration": 0,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T10:28:25.459701432+02:00",
  "SlaveState": true,
  "SlaveLastOK": "2019-04-06T10:28:25.55500823+02:00"
}


% curl http://localhost:4090/api/health
{
  "ClusterState": "OK",
  "ClusterNumRecovers": 1,
  "ClusterLastRecoverDuration": 2473620691,
  "MasterState": true,
  "MasterLastOK": "2019-04-06T10:28:25.459701432+02:00",
  "SlaveState": true,
  "SlaveLastOK": "2019-04-06T10:28:25.55500823+02:00"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].