nodefluent / bigquery-kafka-connect

Licence: MIT license

☁️ nodejs kafka connect connector for Google BigQuery

Programming Languages

184084 projects - #8 most used programming language

Projects that are alternatives of or similar to bigquery-kafka-connect

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+5523.53%)

Mutual labels: bigquery, etl, google-cloud

go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Stars: ✭ 16 (-5.88%)

Mutual labels: bigquery, etl, google-cloud

ob google-bigquery

This service is meant to simplify running Google Cloud operations, especially BigQuery tasks. This means you do not have to worry about installation, configuration or ongoing maintenance related to an SDK environment. This can be helpful to those who would prefer to not to be responsible for those activities.

Stars: ✭ 43 (+152.94%)

Mutual labels: bigquery, google-cloud

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (+76.47%)

Mutual labels: bigquery, etl

Spark Bigquery Connector

BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

Stars: ✭ 126 (+641.18%)

Mutual labels: bigquery, google-cloud

kuromoji-for-bigquery

Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale

Stars: ✭ 11 (-35.29%)

Mutual labels: bigquery, google-cloud

bqv

The simplest tool to manage views of BigQuery.

Stars: ✭ 22 (+29.41%)

Mutual labels: bigquery, google-cloud

Magnolify

A collection of Magnolia add-on modules

Stars: ✭ 81 (+376.47%)

Mutual labels: bigquery, google-cloud

bigquery-to-datastore

Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow

Stars: ✭ 56 (+229.41%)

Mutual labels: bigquery, google-cloud

Scio

A Scala API for Apache Beam and Google Cloud Dataflow.

Stars: ✭ 2,247 (+13117.65%)

Mutual labels: bigquery, google-cloud

Bitcoin Etl

ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 174 (+923.53%)

Mutual labels: bigquery, etl

iris3

An upgraded and improved version of the Iris automatic GCP-labeling project

Stars: ✭ 38 (+123.53%)

Mutual labels: bigquery, google-cloud

argon

Campaign Manager 360 and Display & Video 360 Reports to BigQuery connector

Stars: ✭ 31 (+82.35%)

Mutual labels: bigquery, google-cloud

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (+364.71%)

Mutual labels: bigquery, etl

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+123.53%)

Mutual labels: bigquery, etl

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (+211.76%)

Mutual labels: bigquery, etl

Kafka Ui

Open-Source Web GUI for Apache Kafka Management

Stars: ✭ 230 (+1252.94%)

Mutual labels: big-data, kafka-connect

Aws Etl Orchestrator

A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

Stars: ✭ 245 (+1341.18%)

Mutual labels: big-data, etl

Mara Example Project 2

An example mini data warehouse for python project stats, template for new projects

Stars: ✭ 154 (+805.88%)

Mutual labels: bigquery, etl

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+129.41%)

Mutual labels: big-data, etl

View All Similar Projects ➔

bigquery-kafka-connect

Kafka Connect connector for Google BigQuery

Use API

npm install --save bigquery-kafka-connect

bigquery -> kafka

const { runSourceConnector } = require("bigquery-kafka-connect");
runSourceConnector(config, [], onError).then(config => {
    //runs forever until: config.stop();
});

kafka -> bigquery

const { runSinkConnector } = require("bigquery-kafka-connect");
runSinkConnector(config, [], onError).then(config => {
    //runs forever until: config.stop();
});

kafka -> bigquery (with custom topic (no source-task topic))

const { runSinkConnector, ConverterFactory } = require("bigquery-kafka-connect");

const bigQueryTableDescription = {
    "schema": {
        "fields": [
            { name: "id", type: "INTEGER", mode: "REQUIRED" },
            { name: "name", type: "STRING", mode: "REQUIRED" },
            { name: "info", type: "STRING", mode: "NULLABLE" }
        ]
    },
    "timePartitioning": {"type": "DAY"}
};

const etlFunc = (messageValue, callback) => {

    //type is an example json format field
    if (messageValue.type === "publish") {
        return callback(null, {
            id: messageValue.payload.id,
            name: messageValue.payload.name,
            info: messageValue.payload.info
        });
    }

    if (messageValue.type === "unpublish") {
        return callback(null, null); //null value will cause deletion
    }

    callback(new Error("unknown messageValue.type"));
};

const converter = ConverterFactory.createSinkSchemaConverter(bigQueryTableDescription, etlFunc);

runSinkConnector(config, [converter], onError).then(config => {
    //runs forever until: config.stop();
});

/*
    this example would be able to store kafka message values
    that look like this (so completely unrelated to messages created by a default SourceTask)
    {
        payload: {
            id: 1,
            name: "first item",
            info: "some info"
        },
        type: "publish"
    }
*/

Use CLI

note: in BETA 🌱

npm install -g bigquery-kafka-connect

# run source etl: bigquery -> kafka
nkc-bigquery-source --help

# run sink etl: kafka -> bigquery
nkc-bigquery-sink --help

Config(uration)

const config = {
    kafka: {
        kafkaHost: "localhost:9092",
        logger: null,
        groupId: "kc-bigquery-test",
        clientName: "kc-bigquery-test-name",
        workerPerPartition: 1,
        options: {
            sessionTimeout: 8000,
            protocol: ["roundrobin"],
            fromOffset: "earliest", //latest
            fetchMaxBytes: 1024 * 100,
            fetchMinBytes: 1,
            fetchMaxWaitMs: 10,
            heartbeatInterval: 250,
            retryMinTimeout: 250,
            requireAcks: 1,
            //ackTimeoutMs: 100,
            //partitionerType: 3
        }
    },
    topic: "sc_test_topic",
    partitions: 1,
    maxTasks: 1,
    pollInterval: 2000,
    produceKeyed: true,
    produceCompressionType: 0,
    connector: {
        batchSize: 500,
        maxPollCount: 500,
        projectId: "bq-project-id",
        dataset: "bq_dataset",
        table: "bq_table",
        idColumn: "id"
    },
    http: {
        port: 3149,
        middlewares: []
    },
    enableMetrics: true,
    batch: {
        batchSize: 100, 
        commitEveryNBatch: 1, 
        concurrency: 1,
        commitSync: true
    }
};

Native Client Config(uration)

const config = {
    kafka: {
        noptions: {
            "metadata.broker.list": "localhost:9092",
            "group.id": "kc-bigquery-test",
            "enable.auto.commit": false,
            "debug": "all",
            "event_cb": true,
            "client.id": "kc-bigquery-test-name"
        },
        tconf: {
            "auto.offset.reset": "earliest",
            "request.required.acks": 1
        }
    },
    topic: "sc_test_topic",
    partitions: 1,
    maxTasks: 1,
    pollInterval: 2000,
    produceKeyed: true,
    produceCompressionType: 0,
    connector: {
        batchSize: 500,
        maxPollCount: 500,
        projectId: "bq-project-id",
        dataset: "bq_dataset",
        table: "bq_table",
        idColumn: "id"
    },
    http: {
        port: 3149,
        middlewares: []
    },
    enableMetrics: true
};

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nodefluent / bigquery-kafka-connect

Programming Languages

Labels

Projects that are alternatives of or similar to bigquery-kafka-connect

bigquery-kafka-connect

Use API

bigquery -> kafka

kafka -> bigquery

kafka -> bigquery (with custom topic (no source-task topic))

Use CLI

Config(uration)

Native Client Config(uration)