All Projects → nownabe → go-bqloader

nownabe / go-bqloader

Licence: MIT license
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to go-bqloader

bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (+6.25%)
Mutual labels:  bigquery, etl, google-cloud
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+5875%)
Mutual labels:  bigquery, etl, google-cloud
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+137.5%)
Mutual labels:  bigquery, etl
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+862.5%)
Mutual labels:  bigquery, etl
argon
Campaign Manager 360 and Display & Video 360 Reports to BigQuery connector
Stars: ✭ 31 (+93.75%)
Mutual labels:  bigquery, google-cloud
bigquery-to-datastore
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Stars: ✭ 56 (+250%)
Mutual labels:  bigquery, google-cloud
Magnolify
A collection of Magnolia add-on modules
Stars: ✭ 81 (+406.25%)
Mutual labels:  bigquery, google-cloud
Scio
A Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+13943.75%)
Mutual labels:  bigquery, google-cloud
Dataflowtemplates
Google-provided Cloud Dataflow template pipelines for solving simple in-Cloud data tasks
Stars: ✭ 603 (+3668.75%)
Mutual labels:  bigquery, google-cloud-storage
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+231.25%)
Mutual labels:  bigquery, etl
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+606.25%)
Mutual labels:  google-cloud-storage, google-cloud
shuttlepdf
📃 Smashingly simple, and scalable ("serverless") HTML to PDF conversions using Google Cloud Functions, and Puppeteer.
Stars: ✭ 15 (-6.25%)
Mutual labels:  google-cloud, google-cloud-functions
iris3
An upgraded and improved version of the Iris automatic GCP-labeling project
Stars: ✭ 38 (+137.5%)
Mutual labels:  bigquery, google-cloud
Spark Bigquery Connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (+687.5%)
Mutual labels:  bigquery, google-cloud
Dataflow Tutorial
Cloud Dataflow Tutorial for Beginners
Stars: ✭ 17 (+6.25%)
Mutual labels:  bigquery, google-cloud-storage
Bitcoin Etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (+987.5%)
Mutual labels:  bigquery, etl
ob google-bigquery
This service is meant to simplify running Google Cloud operations, especially BigQuery tasks. This means you do not have to worry about installation, configuration or ongoing maintenance related to an SDK environment. This can be helpful to those who would prefer to not to be responsible for those activities.
Stars: ✭ 43 (+168.75%)
Mutual labels:  bigquery, google-cloud
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+87.5%)
Mutual labels:  bigquery, etl
clj-gcloud-storage
Clojure wrapper for google-cloud-storage Java client.
Stars: ✭ 20 (+25%)
Mutual labels:  google-cloud-storage, google-cloud
starlake
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (+0%)
Mutual labels:  bigquery, etl

bqloader

PkgGoDev Main Branch Workflow Go Report Card codecov GitHub GitHub tag (latest SemVer)

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Installation

go get -u go.nownabe.dev/bqloader

Getting Started with Pre-configured Handlers

See the example to get a full instruction.

To load some types of CSV formats, you can use pre-configured handlers. See full list.

package myfunc

import (
	"context"

	"go.nownabe.dev/bqloader"
	"go.nownabe.dev/bqloader/contrib/handlers"
)

var loader bqloader.BQLoader

func init() {
	loader, _ = bqloader.New()

	t := handlers.TableGenerator(os.Getenv("BIGQUERY_PROJECT_ID"), os.Getenv("BIGQUERY_DATASET_ID"))
	n := &bqloader.SlackNotifier{
		Token:   os.Getenv("SLACK_TOKEN"),
		Channel: os.Getenv("SLACK_CHANNEL"),
	}

	handlers.MustAddHandlers(context.Background(), loader,
		/*
			These build handlers to load CSVs, given four arguments:
			handler name, a pattern to file path on Cloud Storage, a BigQuery table and a notifier.
		*/
		handlers.SBISumishinNetBankStatement("SBI Bank", `^sbi_bank/`, t("sbi_bank"), n),
		handlers.SMBCCardStatement("SMBC Card", `^smbc_card/`, t("smbc_card"), n),
	)
}

// BQLoad is the entrypoint for Cloud Functions.
func BQLoad(ctx context.Context, e bqloader.Event) error {
	return loader.Handle(ctx, e)
}

Getting Started with Custom Handlers

(See Quickstart example to get a full instruction.)

To load other CSVs, import the package go.nownabe.dev/bqloader and write your custom handler.

package myfunc

import (
	"context"
	"os"
	"regexp"
	"strings"
	"time"

	"golang.org/x/text/encoding/japanese"
	"golang.org/x/xerrors"

	"go.nownabe.dev/bqloader"
)

var loader bqloader.BQLoader

func init() {
	loader, _ = bqloader.New()
	loader.MustAddHandler(context.Background(), newHandler())
}

func newHandler() *bqloader.Handler {
	/*
		This projector converts date fields formatted as "2006/01/02"
		at the first column into strings like "2006-01-02" that satisfies
		BigQuery date type.
	*/
	projector := func(_ context.Context, r []string) ([]string, error) {
		t, err := time.Parse("2006/01/02", r[0])
		if err != nil {
			return nil, xerrors.Errorf("Column 0 cannot parse as a date: %w", err)
		}

		r[0] = t.Format("2006-01-02")

		return r, nil
	}

	return &bqloader.Handler{
		Name:     "quickstart",                         // Handler name used in logs and notifications.
		Pattern:  regexp.MustCompile("^example_bank/"), // This handler processes files matched to this pattern.
		Encoding: japanese.ShiftJIS,                    // Source file encoding.
		Parser:   bqloader.CSVParser(),                 // Parser parses source file into records.
		Notifier: &bqloader.SlackNotifier{
			Token:   os.Getenv("SLACK_TOKEN"),
			Channel: os.Getenv("SLACK_CHANNEL"),
		},
		Projector:       projector, // Projector transforms each row.
		SkipLeadingRows: 1,         // Skip header row.

		// Destination.
		Project: os.Getenv("BIGQUERY_PROJECT_ID"),
		Dataset: os.Getenv("BIGQUERY_DATASET_ID"),
		Table:   os.Getenv("BIGQUERY_TABLE_ID"),
	}
}

// BQLoad is the entrypoint for Cloud Functions.
func BQLoad(ctx context.Context, e bqloader.Event) error {
	return loader.Handle(ctx, e)
}

Diagram

diagram

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].