All Projects → lawrencejones → pgsink

lawrencejones / pgsink

Licence: MIT license
Logically replicate data out of Postgres into sinks (files, Google BigQuery, etc)

Programming Languages

go
31211 projects - #10 most used programming language
typescript
32286 projects
Makefile
30231 projects
HTML
75241 projects
Dockerfile
14818 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to pgsink

astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+49.06%)
Mutual labels:  bigquery, postgres
Graphql Engine
Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
Stars: ✭ 24,845 (+46777.36%)
Mutual labels:  bigquery, postgres
Sqlpad
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Stars: ✭ 4,113 (+7660.38%)
Mutual labels:  bigquery, postgres
amplitude-bigquery
Export your events from Amplitude to Google BigQuery/Google Cloud Storage
Stars: ✭ 28 (-47.17%)
Mutual labels:  bigquery
DataflowTemplates
Convenient Dataflow pipelines for transforming data between cloud data sources
Stars: ✭ 22 (-58.49%)
Mutual labels:  bigquery
postgresql lwrp
Express 42 postgresql cookbook
Stars: ✭ 57 (+7.55%)
Mutual labels:  postgres
migrant lib
Embeddable migration management
Stars: ✭ 22 (-58.49%)
Mutual labels:  postgres
objectiv-analytics
Powerful product analytics for data teams, with full control over data & models.
Stars: ✭ 399 (+652.83%)
Mutual labels:  bigquery
pgxmock
pgx mock driver for golang to test database interactions
Stars: ✭ 97 (+83.02%)
Mutual labels:  postgres
shyft
⬡ Shyft is a server-side framework for building powerful GraphQL APIs 🚀
Stars: ✭ 56 (+5.66%)
Mutual labels:  postgres
event bus postgres
🐘 Postgres event store for event_bus
Stars: ✭ 49 (-7.55%)
Mutual labels:  postgres
sparkbq
Sparklyr extension package to connect to Google BigQuery
Stars: ✭ 16 (-69.81%)
Mutual labels:  bigquery
google-cloud
A collection of Google Cloud Platform (GCP) plugins
Stars: ✭ 34 (-35.85%)
Mutual labels:  bigquery
starlake
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-69.81%)
Mutual labels:  bigquery
phpPgAdmin6
PHP7+ Based administration tool for PostgreSQL 9.3+
Stars: ✭ 45 (-15.09%)
Mutual labels:  postgres
logica
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.
Stars: ✭ 1,469 (+2671.7%)
Mutual labels:  bigquery
clunk
Clojure Postgres w/out JDBC
Stars: ✭ 25 (-52.83%)
Mutual labels:  postgres
becquerel
Gateway server that provides an OData interface to BigQuery and Elasticsearch
Stars: ✭ 17 (-67.92%)
Mutual labels:  bigquery
heltin
Robust client registry for individuals receiving mental healthcare services.
Stars: ✭ 18 (-66.04%)
Mutual labels:  postgres
pg-search-sequelize
Postgres full-text search in Node.js and Sequelize.
Stars: ✭ 31 (-41.51%)
Mutual labels:  postgres

pgsink CircleCI Go Reference

Path to v1.0.0: https://github.com/lawrencejones/pgsink/projects/1

Draft docs can be seen at: docs

pgsink is a Postgres change-capture device that supports high-throughput and low-latency capture to a variety of sinks.

You'd use this project if your primary database is Postgres and you want a stress-free, quick-to-setup and easy-to-operate tool to replicate your data to other stores such as BigQuery or Elasticsearch, which works with any size Postgres database.

Screen capture showing how to import a table

Similar projects

There are many change-capture projects out there, and several support Postgres.

As an example, pgsink is similar to debezium in performance and durability goals, but with a much simpler setup (no Kafka required). We also bear similarity to Netflix's dblog, with the benefit of being open-source and available for use.

We win in these comparisons when you want a simple, no additional dependencies setup. We also benefit from the sole focus on Postgres instead of many upstream sources, as we can optimise our data-access pattern for large, high-transaction volume Postgres databases. Examples of this are keeping transactions short to help vacuums, and traversing tables using efficient indexes.

This makes pgsink a much safer bet for people managing production critical Postgres databases.

Developing

As an overview of important packages, for those understanding the source code:

  • changelog the input to sinks, produced by subscription or import
  • decode configures decoding of Postgres types into Golang (text -> int64)
  • imports create, manage and work import jobs, producing changelog entries
  • logical parsing of the pgoutput logical encoding, used by subscription
  • sinks implements different types of sink, from files to Google BigQuery
  • subscription Postgres change capture via replication, generating a changelog

This project comes with a docker-compose development environment. Boot the environment like so:

$ docker-compose up -d
docker-compose up -d
pgsink_prometheus_1 is up-to-date
pgsink_postgres_1 is up-to-date
pgsink_grafana_1 is up-to-date

Then run make recreatedb to create a pgsink database. You can now access your database like so:

$ psql --host localhost --user pgsink pgsink
pgsink=> \q

pgsink will work with this database: try pgsink --sink=file --decode-only.

Database migrations

We use goose to run database migrations. Create new migrations like so:

$ go run internal/migration/cmd/goose.go --dir internal/migration create create_import_jobs_table go
2019/12/29 14:59:51 Created new file: internal/migration/20191229145951_create_import_jobs_table.go

Running migrations is done using the make target:

$ make migrate structure.sql
$ go run internal/migration/cmd/goose.go --install up
2021/01/09 15:38:29 requested --install, so creating schema 'pgsink'
2021/01/09 15:38:29 goose: no migrations to run. current version: 20210102200953
docker-compose --env-file=/dev/null exec -T postgres pg_dump -U postgres pgsink --schema-only --schema=pgsink >structure.sql

Getting started

Boot a Postgres database, then create an example table.

$ createdb pgsink
$ psql pgsink
psql (11.5)
Type "help" for help.

pgsink=# create table public.example (id bigserial primary key, msg text);
CREATE TABLE

pgsink=# insert into public.example (msg) values ('hello world');
INSERT 1

pgsink will stream these changes from the database and send it to the configured sink. Changes are expressed as a stream of messages, either a Schema that describes the structure of a Postgres table, or a Modification corresponding to an insert/update/delete of a row in Postgres.

Our example would produce the following modification, where timestamp is the time at which the change was committed and sequence the operation index within the transaction:

{
  "timestamp": "2019-10-04T16:05:55.123456+01:00",
  "sequence": 1,
  "namespace": "public",
  "name": "example",
  "before": null,
  "after": {
    "id": "1",
    "msg": "hello world"
  }
}

Also sent, arriving before the modification element, will be a schema entry that describes the public.example table. We represent these as Avro schemas, built from the Postgres catalog information.

{
  "timestamp": "2019-10-04T16:05:55.123456+01:00",
  "schema": {
    "namespace": "public.example",
    "type": "record",
    "name": "value",
    "fields": [
      {
        "name": "id",
        "type": ["long", "null"],
        "default": null
      },
      {
        "name": "msg",
        "type": ["string", "null"],
        "default": null
      }
    ]
  }
}

Schemas are published whenever we first discover a relation. Use the timestamp field to order each successive schema event to ensure stale messages don't override more recent data.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].