All Projects → ankane → Pgsync

ankane / Pgsync

Licence: mit
Sync data from one Postgres database to another

Programming Languages

ruby
36898 projects - #4 most used programming language
PLpgSQL
1095 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to Pgsync

Jsonschema2db
Generate tables dynamically from a JSON Schema and insert data
Stars: ✭ 152 (-92.84%)
Mutual labels:  postgresql
Netflix Clone
Netflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (-92.66%)
Mutual labels:  postgresql
Dgw
dgw generates Golang struct, and simple Table/Row Data Gateway functions from PostgreSQL table metadata
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql
Web Budget
Sistema web para controle financeiro pessoal ou de pequenas empresas
Stars: ✭ 152 (-92.84%)
Mutual labels:  postgresql
Ihp
🔥 The fastest way to build type safe web apps. IHP is a new batteries-included web framework optimized for longterm productivity and programmer happiness
Stars: ✭ 2,746 (+29.28%)
Mutual labels:  postgresql
Sqorn
A Javascript library for building SQL queries
Stars: ✭ 1,871 (-11.91%)
Mutual labels:  postgresql
Rust Webapp Starter
Rust single page webapp written in actix-web with vuejs.
Stars: ✭ 151 (-92.89%)
Mutual labels:  postgresql
Postgres Migrations
🐦 A Stack Overflow-inspired PostgreSQL migration library with strict ordering and immutable migrations
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql
Yuniql
Free and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (-92.66%)
Mutual labels:  postgresql
Sqler
write APIs using direct SQL queries with no hassle, let's rethink about SQL
Stars: ✭ 1,943 (-8.52%)
Mutual labels:  postgresql
Taopq
C++ client library for PostgreSQL
Stars: ✭ 153 (-92.8%)
Mutual labels:  postgresql
Knests
Full-stack boilerplate (project/hackathon starter) with Docker/NodeJS/Typescript/GraphQL/React/Material-UI
Stars: ✭ 156 (-92.66%)
Mutual labels:  postgresql
Nodebb
Node.js based forum software built for the modern web
Stars: ✭ 12,303 (+479.24%)
Mutual labels:  postgresql
Tortoise Orm
Familiar asyncio ORM for python, built with relations in mind
Stars: ✭ 2,558 (+20.43%)
Mutual labels:  postgresql
Postgresdbsamples
Sample databases for postgres
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql
Serendipity
A PHP blog software
Stars: ✭ 151 (-92.89%)
Mutual labels:  postgresql
Database To Plantuml
Compile PostgreSQL and MySQL table information into a PlantUML description.
Stars: ✭ 157 (-92.61%)
Mutual labels:  postgresql
Pifpaf
Python fixtures and daemon managing tools for functional testing
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql
Doctrine Postgis
Spatial and Geographic Data with PostGIS and Doctrine.
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql
Vipsql
A vim-frontend for interacting with psql
Stars: ✭ 161 (-92.42%)
Mutual labels:  postgresql

pgsync

Sync data from one Postgres database to another (like pg_dump/pg_restore). Designed for:

  • speed - tables are transferred in parallel
  • security - built-in methods to prevent sensitive data from ever leaving the server
  • flexibility - gracefully handles schema differences, like missing columns and extra columns
  • convenience - sync partial tables, groups of tables, and related records

🍊 Battle-tested at Instacart

Build Status

Installation

pgsync is a command line tool. To install, run:

gem install pgsync

This will give you the pgsync command. You can also install it with Homebrew. If installation fails, you may need to install dependencies.

Setup

In your project directory, run:

pgsync --init

This creates .pgsync.yml for you to customize. We recommend checking this into your version control (assuming it doesn’t contain sensitive information). pgsync commands can be run from this directory or any subdirectory.

How to Use

First, make sure your schema is set up in both databases. We recommend using a schema migration tool for this, but pgsync also provides a few convenience methods. Once that’s done, you’re ready to sync data.

Sync tables

pgsync

Sync specific tables

pgsync table1,table2

Works with wildcards as well

pgsync "table*"

Sync specific rows (existing rows are overwritten)

pgsync products "where store_id = 1"

You can also preserve existing rows

pgsync products "where store_id = 1" --preserve

Or truncate them

pgsync products "where store_id = 1" --truncate

Tables

Exclude specific tables

pgsync --exclude table1,table2

Add to .pgsync.yml to exclude by default

exclude:
  - table1
  - table2

Sync tables from all schemas or specific schemas (by default, only the search path is synced)

pgsync --all-schemas
# or
pgsync --schemas public,other
# or
pgsync public.table1,other.table2

Groups

Define groups in .pgsync.yml:

groups:
  group1:
    - table1
    - table2

And run:

pgsync group1

Variables

You can also use groups to sync a specific record and associated records in other tables.

To get product 123 with its reviews, last 10 coupons, and store, use:

groups:
  product:
    products: "where id = {1}"
    reviews: "where product_id = {1}"
    coupons: "where product_id = {1} order by created_at desc limit 10"
    stores: "where id in (select store_id from products where id = {1})"

And run:

pgsync product:123

Schema

Sync schema before the data (this wipes out existing data)

pgsync --schema-first

Specify tables

pgsync table1,table2 --schema-first

Or just the schema

pgsync --schema-only

pgsync does not try to sync Postgres extensions.

Sensitive Data

Prevent sensitive data like email addresses from leaving the remote server.

Define rules in .pgsync.yml:

data_rules:
  email: unique_email
  last_name: random_letter
  birthday: random_date
  users.auth_token:
    value: secret
  visits_count:
    statement: "(RANDOM() * 10)::int"
  encrypted_*: null

last_name matches all columns named last_name and users.last_name matches only the users table. Wildcards are supported, and the first matching rule is applied.

Options for replacement are:

  • unique_email
  • unique_phone
  • unique_secret
  • random_letter
  • random_int
  • random_date
  • random_time
  • random_ip
  • value
  • statement
  • null
  • untouched

Rules starting with unique_ require the table to have a single column primary key. unique_phone requires a numeric primary key.

Foreign Keys

Foreign keys can make it difficult to sync data. Three options are:

  1. Defer constraints (recommended)
  2. Manually specify the order of tables
  3. Disable foreign key triggers, which can silently break referential integrity (not recommended)

To defer constraints, use:

pgsync --defer-constraints-v2

To manually specify the order of tables, use --jobs 1 so tables are synced one-at-a-time.

pgsync table1,table2,table3 --jobs 1

To disable foreign key triggers and potentially break referential integrity, use:

pgsync --disable-integrity

This requires superuser privileges on the to database. If syncing to (not from) Amazon RDS, use the rds_superuser role. If syncing to (not from) Heroku, there doesn’t appear to be a way to disable integrity.

Triggers

Disable user triggers with:

pgsync --disable-user-triggers

Append-Only Tables

For extremely large, append-only tables, sync in batches.

pgsync large_table --in-batches

The script will resume where it left off when run again, making it great for backfills.

Connection Security

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust. Your best option is to connect over SSH or a VPN. Another option is to use sslmode=verify-full. If you don’t do this, your database credentials can be compromised.

Safety

To keep you from accidentally overwriting production, the destination is limited to localhost or 127.0.0.1 by default.

To use another host, add to_safe: true to your .pgsync.yml.

Multiple Databases

To use with multiple databases, run:

pgsync --init db2

This creates .pgsync-db2.yml for you to edit. Specify a database in commands with:

pgsync --db db2

Integrations

Django

If you run pgsync --init in a Django project, migrations will be excluded in .pgsync.yml.

exclude:
  - django_migrations

Heroku

If you run pgsync --init in a Heroku project, the from database will be set in .pgsync.yml.

from: $(heroku config:get DATABASE_URL)?sslmode=require

Laravel

If you run pgsync --init in a Laravel project, migrations will be excluded in .pgsync.yml.

exclude:
  - migrations

Rails

If you run pgsync --init in a Rails project, Active Record metadata and schema migrations will be excluded in .pgsync.yml.

exclude:
  - ar_internal_metadata
  - schema_migrations

Debugging

To view the SQL that’s run, use:

pgsync --debug

Other Commands

Help

pgsync --help

Version

pgsync --version

List tables

pgsync --list

Scripts

Use groups when possible to take advantage of parallelism.

For Ruby scripts, you may need to do:

Bundler.with_unbundled_env do
  system "pgsync ..."
end

Homebrew

On Mac, you can use:

brew install ankane/brew/pgsync

Dependencies

If installation fails, your system may be missing Ruby or libpq.

On Mac, run:

brew install postgresql

On Ubuntu, run:

sudo apt-get install ruby-dev libpq-dev build-essential

Upgrading

Run:

gem install pgsync

To use master, run:

gem install specific_install
gem specific_install https://github.com/ankane/pgsync.git

Related Projects

Also check out:

  • Dexter - The automatic indexer for Postgres
  • PgHero - A performance dashboard for Postgres
  • pgslice - Postgres partitioning as easy as pie

Thanks

Inspired by heroku-pg-transfer.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/pgsync.git
cd pgsync
bundle install

createdb pgsync_test1
createdb pgsync_test2
createdb pgsync_test3

bundle exec rake test
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].