Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → mkabilov → Pg2ch

mkabilov / Pg2ch

Licence: mit

Data streaming from postgresql to clickhouse via logical replication mechanism

Programming Languages

31211 projects - #10 most used programming language

golang

3204 projects

Labels

postgresql clickhouse

Projects that are alternatives of or similar to Pg2ch

Freesql

🦄 .NET orm, Mysql orm, Postgresql orm, SqlServer orm, Oracle orm, Sqlite orm, Firebird orm, 达梦 orm, 人大金仓 orm, 神通 orm, 翰高 orm, 南大通用 orm, Click house orm, MsAccess orm.

Stars: ✭ 3,077 (+1965.1%)

Mutual labels: postgresql, clickhouse

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (+55.7%)

Mutual labels: postgresql, clickhouse

Synch

Sync data from the other DB to ClickHouse(cluster)

Stars: ✭ 200 (+34.23%)

Mutual labels: postgresql, clickhouse

Analytics

Simple, open-source, lightweight (< 1 KB) and privacy-friendly web analytics alternative to Google Analytics.

Stars: ✭ 9,469 (+6255.03%)

Mutual labels: postgresql, clickhouse

Sqlcell

SQLCell is a magic function for the Jupyter Notebook that executes raw, parallel, parameterized SQL queries with the ability to accept Python values as parameters and assign output data to Python variables while concurrently running Python code. And *much* more.

Stars: ✭ 145 (-2.68%)

Mutual labels: postgresql

Sql Fundamentals

👨‍🏫 Mike's SQL Fundamentals and Professional SQL Courses

Stars: ✭ 140 (-6.04%)

Mutual labels: postgresql

Pg extractor

PG Extractor - Advanced PostgreSQL Dump Filter

Stars: ✭ 139 (-6.71%)

Mutual labels: postgresql

Wal G

Archival and Restoration for Postgres

Stars: ✭ 1,974 (+1224.83%)

Mutual labels: postgresql

Querybuilder

SQL query builder, written in c#, helps you build complex queries easily, supports SqlServer, MySql, PostgreSql, Oracle, Sqlite and Firebird

Stars: ✭ 2,111 (+1316.78%)

Mutual labels: postgresql

Indigo

Universal cheminformatics libraries, utilities and database search tools

Stars: ✭ 146 (-2.01%)

Mutual labels: postgresql

Dapper.fsharp

Lightweight F# extension for StackOverflow Dapper with support for MSSQL, MySQL and PostgreSQL

Stars: ✭ 145 (-2.68%)

Mutual labels: postgresql

Pgproxy

PostgreSQL proxy server.

Stars: ✭ 140 (-6.04%)

Mutual labels: postgresql

Lapidus

Stream your PostgreSQL, MySQL or MongoDB databases anywhere, fast.

Stars: ✭ 145 (-2.68%)

Mutual labels: postgresql

Stratosdb

☄️ ☁️ An All-in-One GUI for Cloud SQL that can help users design and test their AWS RDS Instances

Stars: ✭ 140 (-6.04%)

Mutual labels: postgresql

Node Express Postgresql Sequelize

Node.js, Express.js, Sequelize.js and PostgreSQL RESTful API

Stars: ✭ 148 (-0.67%)

Mutual labels: postgresql

Carbon Clickhouse

Graphite metrics receiver with ClickHouse as storage

Stars: ✭ 139 (-6.71%)

Mutual labels: clickhouse

Go Cache

This project encapsulates multiple db servers, redis、ledis、memcache、file、memory、nosql、postgresql

Stars: ✭ 143 (-4.03%)

Mutual labels: postgresql

Elephant Shed

PostgreSQL Management Appliance

Stars: ✭ 146 (-2.01%)

Mutual labels: postgresql

Clickhouse Net

Yandex ClickHouse fully managed .NET client

Stars: ✭ 142 (-4.7%)

Mutual labels: clickhouse

Serverless Pg

A package for managing PostgreSQL connections at SERVERLESS scale

Stars: ✭ 142 (-4.7%)

Mutual labels: postgresql

View All Similar Projects ➔

PostgreSQL to ClickHouse

Continuous data transfer from PostgreSQL to ClickHouse using logical replication mechanism.

Status of the project

Currently pg2ch tool is in active testing stage, as for now it is not for production use

Getting and running

Get:

    go get -u github.com/mkabilov/pg2ch

Run:

    pg2ch --config {path to the config file (default config.yaml)}

Config file

tables:
    {postgresql table name}:
        main_table: {clickhouse table name}
        buffer_table: {clickhouse buffer table name} # optional, if not specified, insert directly to the main table
        buffer_row_id: {clickhouse buffer table column name for row id} 
        init_sync_skip: {skip initial copy of the data}
        init_sync_skip_buffer_table: {if true bypass buffer_table and write directly to the main_table on initial sync copy}
                                     # makes sense in case of huge tables        
        init_sync_skip_truncate: {skip truncate of the main_table during init sync}                                 
        engine: {clickhouse table engine: MergeTree, ReplacingMergeTree or CollapsingMergeTree}
        max_buffer_length: {number of DML(insert/update/delete) commands to store in the memory before flushing to the buffer/main table } 
        merge_threshold: {if buffer table specified, number of buffer flushed before moving data from buffer to the main table}
        columns: # postgres - clickhouse column name mapping, 
                 # if not present, all the columns are expected to be on the clickhouse side with the exact same names 
            {postgresql column name}: {clickhouse column name}
        is_deleted_column: # in case of ReplacingMergeTree 1 will be stored in the {is_deleted_column} in order to mark deleted rows
        sign_column: {clickhouse sign column name for CollapsingMergeTree engines only, default "sign"}
        ver_column: {clickhouse version column name for the ReplacingMergeTree engine, default "ver"}

inactivity_merge_timeout: {interval, default 1 min} # merge buffered data after that timeout

clickhouse: # clickhouse tcp protocol connection params
    host: {clickhouse host, default 127.0.0.1}
    port: {tcp port, default 9000}
    database: {database name}
    username: {username}
    password: {password}
    params:
        {extra param name}:{extra param value}
        ...

postgres: # postgresql connection params
    host: {host name, default 127.0.0.1}
    port: {port, default 5432}
    database: {database name}
    user: {user}
    replication_slot_name: {logical replication slot name}
    publication_name: {postgresql publication name}
    
db_path: {path to the persistent storage dir where table lsn positions will be stored}

Sample setup:

make sure you have PostgreSQL server running on localhost:5432
- set wal_level in the postgresql config file to logical
- set max_replication_slots to at least 2
make sure you have ClickHouse server running on localhost:9000 e.g. in the docker
create database pg2ch_test in PostgreSQL: CREATE DATABASE pg2ch_test;
create a set of tables using pgbench command: pgbench -U postgres -d pg2ch_test -i
change replica identity for the pgbench_accounts table to FULL, so that we'll receive old values of the updated rows: ALTER TABLE pgbench_accounts REPLICA IDENTITY FULL;
create PostgreSQL publication for the pgbench_accounts table: CREATE PUBLICATION pg2ch_pub FOR TABLE pgbench_accounts;
create PostgreSQL logical replication slot: SELECT * FROM pg_create_logical_replication_slot('pg2ch_slot', 'pgoutput');
create tables on the ClickHouse side:

CREATE TABLE pgbench_accounts (aid Int32, abalance Int32, sign Int8) ENGINE = CollapsingMergeTree(sign) ORDER BY aid
-- our target table

CREATE TABLE pgbench_accounts_buf (aid Int32, abalance Int32, sign Int8, row_id UInt64) ENGINE = Memory()
-- will be used as a buffer table

create config.yaml file with the following content:

tables:
    pgbench_accounts:
        main_table: pgbench_accounts
        buffer_table: pgbench_accounts_buf
        buffer_row_id: row_id
        engine: CollapsingMergeTree
        max_buffer_length: 1000
        merge_threshold: 4
        columns:
            aid: aid
            abalance: abalance
        sign_column: sign

inactivity_merge_timeout: '10s'

clickhouse:
    host: localhost
    port: 9000
    database: default
    username: default
postgres:
    host: localhost
    port: 5432
    database: pg2ch_test
    user: postgres
    replication_slot_name: pg2ch_slot
    publication_name: pg2ch_pub
    
db_path: db

run pg2ch to start replication:

    pg2ch --config config.yaml

run pgbench to have some test load:

    pgbench -U postgres -d pg2ch_test --time 30 --client 10

wait for inactivity_merge_timeout period (in our case 10 seconds) so that data in the memory gets flushed to the table in ClickHouse
check the sums of the abalance column both on ClickHouse and PostgreSQL:
- ClickHouse: SELECT SUM(abalance * sign), SUM(sign) FROM pgbench_accounts (why multiply by sign column?)
- PostgreSQL: SELECT SUM(abalance), COUNT(*) FROM pgbench_accounts
numbers must match; if not, please open an issue.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 149

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗