All Projects → Powerspace → Pg2bq

Powerspace / Pg2bq

Licence: mit
Export PostgreSQL tables to Google BigQuery

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Pg2bq

Ddlparse
DDL parase and Convert to BigQuery JSON schema and DDL statements
Stars: ✭ 52 (+73.33%)
Mutual labels:  bigquery, postgresql
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+67056.67%)
Mutual labels:  bigquery, postgresql
Sql Runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (+126.67%)
Mutual labels:  bigquery, postgresql
Franchise
🍟 a notebook sql client. what you get when have a lot of sequels.
Stars: ✭ 3,823 (+12643.33%)
Mutual labels:  bigquery, postgresql
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+39843.33%)
Mutual labels:  bigquery, postgresql
Tbls
tbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+3033.33%)
Mutual labels:  bigquery, postgresql
Copostgresquery
Query builder for PostgreSQL in Node.js, built for async
Stars: ✭ 12 (-60%)
Mutual labels:  postgresql
Pgsql Cluster Manager
Daemon and migration tool that manages Postgres cluster using etcd/corosync/pacemaker
Stars: ✭ 15 (-50%)
Mutual labels:  postgresql
Netkiller.github.io
Netkiller Free ebook - 免费电子书
Stars: ✭ 861 (+2770%)
Mutual labels:  postgresql
Koa Starter
An opinionated Koa starter kit
Stars: ✭ 10 (-66.67%)
Mutual labels:  postgresql
Zeeql3
The ZeeQL (EOF/CoreData/AR like) Database Toolkit for Swift
Stars: ✭ 29 (-3.33%)
Mutual labels:  postgresql
Hydrocarbon
not just an rss reader
Stars: ✭ 21 (-30%)
Mutual labels:  postgresql
Express Starter
Express Starter
Stars: ✭ 14 (-53.33%)
Mutual labels:  postgresql
Monogamy
Add table-level database locking to ActiveRecord
Stars: ✭ 12 (-60%)
Mutual labels:  postgresql
Treefrog Framework
TreeFrog Framework : High-speed C++ MVC Framework for Web Application
Stars: ✭ 885 (+2850%)
Mutual labels:  postgresql
Crunchy Containers
Containers for Managing PostgreSQL on Kubernetes by Crunchy Data
Stars: ✭ 870 (+2800%)
Mutual labels:  postgresql
Kotgres
SQL generator and result set mapper for Postgres and Kotlin
Stars: ✭ 21 (-30%)
Mutual labels:  postgresql
Servicebot
Open-source subscription management & billing automation system
Stars: ✭ 857 (+2756.67%)
Mutual labels:  postgresql
Notes App
Node.js application - simple notes management using Express, Postgres, Objection.js, Docker, Socket.io, Bluebird Promises
Stars: ✭ 14 (-53.33%)
Mutual labels:  postgresql
Postgresql Postgis Timescaledb
PostgreSQL + PostGIS + TimescaleDB docker image 🐘🌎📈
Stars: ✭ 19 (-36.67%)
Mutual labels:  postgresql

pg2bq

pg2bq is made for one thing: export tables from PostgreSQL to Google BigQuery.

Why ?

It's useful to keep the data at both places "in-sync" (using cron, airflow, or whatever to schedule the export on a regular basis). If your metadata are on PostgreSQL, but your realtime data are in BigQuery, it's probable you want to join them.

Note: internally, it is using the Spark framework for the sake of simplicity, but no Hadoop cluster is needed. It is configured as a "local" cluster by default, meaning the application is running standalone.

How to run it

  • Download the release made on GitHub: pg2bq-1.0.3.zip
  • Create a Service Account to Google Cloud which has access to GCS and BigQuery, and create a json key
  • Create a configuration file configuration.conf for pg2bq to know where to grab and put the data (HOCON):
jdbc {
  url = "jdbc:postgresql://mypg:5432/mydb"
  user = "myuser"
  password = "mypwd"
  tables = [ "user", "campaign", "website" ]
}

gcloud {
  project = "gcloud-project-id"
  service-account-key-path = "/path/to/service-account-key.json"
  bq.dataset = "mypg"
  gcs.tmp-bucket = "pg-export-tmp"
}
  • Run the application specifying the config file:
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json ./bin/pg2db -Dconfig.file=configuration.conf
  • Done!

Add this to a scheduler every 10min and enjoy your JOINs in BigQuery.

What does it do exactly ?

  • It exports the data from the tables into DataFrames
  • It saves them into GCS as .avro to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehands.
  • It starts BigQuery jobs to import those .avro into the respective BigQuery tables.

Development

To run the application in dev mode, create a proper application.conf and run the app:

  • via the IDE, just run Main
  • via sbt, by pre-packaging the whole thing:
$ sbt stage && ./target/universal/stage/bin/pg2bq
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].