All Projects → kikinteractive → Go Bqstreamer

kikinteractive / Go Bqstreamer

Licence: mit
Stream data into Google BigQuery concurrently using InsertAll()

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Labels

Projects that are alternatives of or similar to Go Bqstreamer

Dataflowtemplates
Google-provided Cloud Dataflow template pipelines for solving simple in-Cloud data tasks
Stars: ✭ 603 (+353.38%)
Mutual labels:  bigquery
Sql Runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-48.87%)
Mutual labels:  bigquery
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+8909.77%)
Mutual labels:  bigquery
Tbls
tbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+606.77%)
Mutual labels:  bigquery
Ddlparse
DDL parase and Convert to BigQuery JSON schema and DDL statements
Stars: ✭ 52 (-60.9%)
Mutual labels:  bigquery
Ethereum Etl Airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. What datasets do you want to be added to Ethereum ETL? Vote here: https://blockchain-etl.convas.io.
Stars: ✭ 89 (-33.08%)
Mutual labels:  bigquery
Bigrquery
An interface to Google's BigQuery from R.
Stars: ✭ 430 (+223.31%)
Mutual labels:  bigquery
Mais
Universalizando o acesso a dados no Brasil. Docs: https://basedosdados.github.io/mais/
Stars: ✭ 122 (-8.27%)
Mutual labels:  bigquery
Spark Bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-51.13%)
Mutual labels:  bigquery
Gcp Variant Transforms
GCP Variant Transforms
Stars: ✭ 100 (-24.81%)
Mutual labels:  bigquery
Pg2bq
Export PostgreSQL tables to Google BigQuery
Stars: ✭ 30 (-77.44%)
Mutual labels:  bigquery
Datashare Toolkit
DIY commercial datasets on Google Cloud Platform
Stars: ✭ 41 (-69.17%)
Mutual labels:  bigquery
Magnolify
A collection of Magnolia add-on modules
Stars: ✭ 81 (-39.1%)
Mutual labels:  bigquery
Dataflow Tutorial
Cloud Dataflow Tutorial for Beginners
Stars: ✭ 17 (-87.22%)
Mutual labels:  bigquery
Beast
Load data from Kafka to any data warehouse
Stars: ✭ 119 (-10.53%)
Mutual labels:  bigquery
Graphql Engine
Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
Stars: ✭ 24,845 (+18580.45%)
Mutual labels:  bigquery
Linq To Bigquery
LINQ to BigQuery is C# LINQ Provider for Google BigQuery. It also enables Desktop GUI Client with LINQPad and plug-in driver.
Stars: ✭ 69 (-48.12%)
Mutual labels:  bigquery
Spark Bigquery Connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-5.26%)
Mutual labels:  bigquery
Professional Services
Common solutions and tools developed by Google Cloud's Professional Services team
Stars: ✭ 1,923 (+1345.86%)
Mutual labels:  bigquery
Embulk Output Bigquery
Embulk output plugin to load/insert data into Google BigQuery
Stars: ✭ 99 (-25.56%)
Mutual labels:  bigquery

Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development.

BigQuery Streamer BigQuery GoDoc

Stream insert data into BigQuery fast and concurrently, using InsertAll().

Features

  • Insert rows from multiple tables, datasets, and projects, and insert them bulk. No need to manage data structures and sort rows by tables - bqstreamer does it for you.
  • Multiple background workers (i.e. goroutines) to enqueue and insert rows.
  • Insert can be done in a blocking or in the background (asynchronously).
  • Perform insert operations in predefined set sizes, according to BigQuery's quota policy.
  • Handle and retry BigQuery server errors.
  • Backoff interval between failed insert operations.
  • Error reporting.
  • Production ready, and thoroughly tested. We - at Rounds (now acquired by Kik) - are using it in our data gathering workflow.
  • Thorough testing and documentation for great good!

Getting Started

  1. Install Go, version should be at least 1.5.
  2. Clone this repository and download dependencies:
  3. Version v2: go get gopkg.in/kikinteractive/go-bqstreamer.v2
  4. Version v1: go get gopkg.in/kikinteractive/go-bqstreamer.v1
  5. Acquire Google OAuth2/JWT credentials, so you can authenticate with BigQuery.

How Does It Work?

There are two types of inserters you can use:

  1. SyncWorker, which is a single blocking (synchronous) worker.
  2. It enqueues rows and performs insert operations in a blocking manner.
  3. AsyncWorkerGroup, which employes multiple background SyncWorkers.
  4. The AsyncWorkerGroup enqueues rows, and its background workers pull and insert in a fan-out model.
  5. An insert operation is executed according to row amount or time thresholds for each background worker.
  6. Errors are reported to an error channel for processing by the user.
  7. This provides a higher insert throughput for larger scale scenarios.

Examples

Check the GoDoc examples section.

Contribute

  1. Please check the issues page.
  2. File new bugs and ask for improvements.
  3. Pull requests welcome!

Test

# Run unit tests and check coverage.
$ make test

# Run integration tests.
# This requires an active project, dataset and pem key.
$ export BQSTREAMER_PROJECT=my-project
$ export BQSTREAMER_DATASET=my-dataset
$ export BQSTREAMER_TABLE=my-table
$ export BQSTREAMER_KEY=my-key.json
$ make testintegration
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].