All Projects → long2ice → Synch

long2ice / Synch

Licence: apache-2.0
Sync data from the other DB to ClickHouse(cluster)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Synch

Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+16%)
Mutual labels:  kafka, mysql, postgresql, clickhouse
Pg chameleon
MySQL to PostgreSQL replica system
Stars: ✭ 274 (+37%)
Mutual labels:  mysql, postgresql, replication
Devops Bash Tools
550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Stars: ✭ 226 (+13%)
Mutual labels:  kafka, mysql, postgresql
Pmacct
pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].
Stars: ✭ 677 (+238.5%)
Mutual labels:  kafka, mysql, postgresql
Amazonriver
amazonriver 是一个将postgresql的实时数据同步到es或kafka的服务
Stars: ✭ 198 (-1%)
Mutual labels:  kafka, postgresql, replication
Freesql
🦄 .NET orm, Mysql orm, Postgresql orm, SqlServer orm, Oracle orm, Sqlite orm, Firebird orm, 达梦 orm, 人大金仓 orm, 神通 orm, 翰高 orm, 南大通用 orm, Click house orm, MsAccess orm.
Stars: ✭ 3,077 (+1438.5%)
Mutual labels:  mysql, postgresql, clickhouse
Symmetric Ds
SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.
Stars: ✭ 450 (+125%)
Mutual labels:  mysql, postgresql, replication
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+5589%)
Mutual labels:  kafka, mysql, clickhouse
Back End Interview
后端面试题汇总(Python、Redis、MySQL、PostgreSQL、Kafka、数据结构、算法、编程、网络)
Stars: ✭ 188 (-6%)
Mutual labels:  kafka, mysql, postgresql
Spring Boot 2.x Examples
Spring Boot 2.x code examples
Stars: ✭ 104 (-48%)
Mutual labels:  kafka, mysql, postgresql
Datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+63.5%)
Mutual labels:  kafka, mysql, postgresql
Tunnel
PG数据同步工具(Java实现)
Stars: ✭ 122 (-39%)
Mutual labels:  kafka, postgresql, replication
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+313%)
Mutual labels:  kafka, mysql, clickhouse
Bireme
Bireme is an incremental synchronization tool for the Greenplum / HashData data warehouse
Stars: ✭ 110 (-45%)
Mutual labels:  kafka, mysql, postgresql
Lapidus
Stream your PostgreSQL, MySQL or MongoDB databases anywhere, fast.
Stars: ✭ 145 (-27.5%)
Mutual labels:  mysql, postgresql, replication
Operators
Collection of Kubernetes Operators built with KUDO.
Stars: ✭ 175 (-12.5%)
Mutual labels:  kafka, mysql
Nanodbc
A small C++ wrapper for the native C ODBC API | Requires C++14 since v2.12
Stars: ✭ 175 (-12.5%)
Mutual labels:  mysql, postgresql
Mysql Sandbox
Quick and painless install of one or more MySQL servers in the same host.
Stars: ✭ 176 (-12%)
Mutual labels:  mysql, replication
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+1005.5%)
Mutual labels:  mysql, postgresql
Qxorm
QxOrm library - C++ Qt ORM (Object Relational Mapping) and ODM (Object Document Mapper) library - Official repository
Stars: ✭ 176 (-12%)
Mutual labels:  mysql, postgresql

Synch

pypi docker license workflows workflows

中文文档

Introduction

Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.

synch

Features

  • Full data etl and real time increment etl.
  • Support DDL and DML sync, current support add column and drop column and change column of DDL, and full support of DML also.
  • Email error report.
  • Support kafka and redis as broker.
  • Multiple source db sync to ClickHouse at the same time。
  • Support ClickHouse MergeTree,CollapsingMergeTree,VersionedCollapsingMergeTree,ReplacingMergeTree.
  • Support ClickHouse cluster.

Requirements

  • Python >= 3.7
  • redis, cache mysql binlog file and position and as broker, support redis cluster also.
  • kafka, need if you use kafka as broker.
  • clickhouse-jdbc-bridge, need if you use postgres and set auto_full_etl = true, or exec synch etl command.
  • sentry, error reporting, worked if set dsn in config.

Install

> pip install synch

Usage

Config file synch.yaml

synch will read default config from ./synch.yaml, or you can use synch -c specify config file.

See full example config in synch.yaml.

Full data etl

Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew.

> synch --alias mysql_db etl -h

Usage: synch etl [OPTIONS]

  Make etl from source table to ClickHouse.

Options:
  --schema TEXT     Schema to full etl.
  --renew           Etl after try to drop the target tables.
  -t, --table TEXT  Tables to full etl.
  -h, --help        Show this message and exit.

Full etl from table test.test:

> synch --alias mysql_db etl --schema test --table test --table test2

Produce

Listen all MySQL binlog and produce to broker.

> synch --alias mysql_db produce

Consume

Consume message from broker and insert to ClickHouse,and you can skip error rows with --skip-error. And synch will do full etl at first when set auto_full_etl = true in config.

> synch --alias mysql_db consume -h

Usage: synch consume [OPTIONS]

  Consume from broker and insert into ClickHouse.

Options:
  --schema TEXT       Schema to consume.  [required]
  --skip-error        Skip error rows.
  --last-msg-id TEXT  Redis stream last msg id or kafka msg offset, depend on
                      broker_type in config.

  -h, --help          Show this message and exit.

Consume schema test and insert into ClickHouse:

> synch --alias mysql_db consume --schema test

Monitor

Set true to core.monitoring, which will create database synch in ClickHouse automatically and insert monitoring data.

Table struct:

create table if not exists synch.log
(
    alias      String,
    schema     String,
    table      String,
    num        int,
    type       int, -- 1:producer, 2:consumer
    created_at DateTime
)
    engine = MergeTree partition by toYYYYMM(created_at) order by created_at;

ClickHouse Table Engine

Now synch support MergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree.

Use docker-compose(recommended)

Redis Broker, lightweight and for low concurrency
version: "3"
services:
  producer:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis
volumes:
  redis:
Kafka Broker, for high concurrency
version: "3"
services:
  zookeeper:
    image: bitnami/zookeeper:3
    hostname: zookeeper
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zookeeper:/bitnami
  kafka:
    image: bitnami/kafka:2
    hostname: kafka
    environment:
      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - JMX_PORT=23456
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
    depends_on:
      - zookeeper
    volumes:
      - kafka:/bitnami
  kafka-manager:
    image: hlebalbau/kafka-manager
    ports:
      - "9000:9000"
    environment:
      ZK_HOSTS: "zookeeper:2181"
      KAFKA_MANAGER_AUTH_ENABLED: "false"
    command: -Dpidfile.path=/dev/null
  producer:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis:/data
volumes:
  redis:
  kafka:
  zookeeper:

Important

  • You need always keep a primary key or unique key without null or composite primary key.
  • DDL sync not support postgres.
  • Postgres sync is not fully test, be careful use it in production.

Support this project

AliPay WeChatPay PayPal
PayPal to my account long2ice.

ThanksTo

Powerful Python IDE Pycharm from Jetbrains.

jetbrains

License

This project is licensed under the Apache-2.0 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].