All Projects → nodefluent → bigquery-kafka-connect

nodefluent / bigquery-kafka-connect

Licence: MIT license
☁️ nodejs kafka connect connector for Google BigQuery

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to bigquery-kafka-connect

Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+5523.53%)
Mutual labels:  bigquery, etl, google-cloud
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-5.88%)
Mutual labels:  bigquery, etl, google-cloud
ob google-bigquery
This service is meant to simplify running Google Cloud operations, especially BigQuery tasks. This means you do not have to worry about installation, configuration or ongoing maintenance related to an SDK environment. This can be helpful to those who would prefer to not to be responsible for those activities.
Stars: ✭ 43 (+152.94%)
Mutual labels:  bigquery, google-cloud
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+76.47%)
Mutual labels:  bigquery, etl
Spark Bigquery Connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (+641.18%)
Mutual labels:  bigquery, google-cloud
kuromoji-for-bigquery
Tokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale
Stars: ✭ 11 (-35.29%)
Mutual labels:  bigquery, google-cloud
bqv
The simplest tool to manage views of BigQuery.
Stars: ✭ 22 (+29.41%)
Mutual labels:  bigquery, google-cloud
Magnolify
A collection of Magnolia add-on modules
Stars: ✭ 81 (+376.47%)
Mutual labels:  bigquery, google-cloud
bigquery-to-datastore
Export a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Stars: ✭ 56 (+229.41%)
Mutual labels:  bigquery, google-cloud
Scio
A Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+13117.65%)
Mutual labels:  bigquery, google-cloud
Bitcoin Etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (+923.53%)
Mutual labels:  bigquery, etl
iris3
An upgraded and improved version of the Iris automatic GCP-labeling project
Stars: ✭ 38 (+123.53%)
Mutual labels:  bigquery, google-cloud
argon
Campaign Manager 360 and Display & Video 360 Reports to BigQuery connector
Stars: ✭ 31 (+82.35%)
Mutual labels:  bigquery, google-cloud
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+364.71%)
Mutual labels:  bigquery, etl
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+123.53%)
Mutual labels:  bigquery, etl
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+211.76%)
Mutual labels:  bigquery, etl
Kafka Ui
Open-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (+1252.94%)
Mutual labels:  big-data, kafka-connect
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+1341.18%)
Mutual labels:  big-data, etl
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+805.88%)
Mutual labels:  bigquery, etl
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+129.41%)
Mutual labels:  big-data, etl

bigquery-kafka-connect

Greenkeeper badge Kafka Connect connector for Google BigQuery

Build Status

Coverage Status

Use API

npm install --save bigquery-kafka-connect

bigquery -> kafka

const { runSourceConnector } = require("bigquery-kafka-connect");
runSourceConnector(config, [], onError).then(config => {
    //runs forever until: config.stop();
});

kafka -> bigquery

const { runSinkConnector } = require("bigquery-kafka-connect");
runSinkConnector(config, [], onError).then(config => {
    //runs forever until: config.stop();
});

kafka -> bigquery (with custom topic (no source-task topic))

const { runSinkConnector, ConverterFactory } = require("bigquery-kafka-connect");

const bigQueryTableDescription = {
    "schema": {
        "fields": [
            { name: "id", type: "INTEGER", mode: "REQUIRED" },
            { name: "name", type: "STRING", mode: "REQUIRED" },
            { name: "info", type: "STRING", mode: "NULLABLE" }
        ]
    },
    "timePartitioning": {"type": "DAY"}
};

const etlFunc = (messageValue, callback) => {

    //type is an example json format field
    if (messageValue.type === "publish") {
        return callback(null, {
            id: messageValue.payload.id,
            name: messageValue.payload.name,
            info: messageValue.payload.info
        });
    }

    if (messageValue.type === "unpublish") {
        return callback(null, null); //null value will cause deletion
    }

    callback(new Error("unknown messageValue.type"));
};

const converter = ConverterFactory.createSinkSchemaConverter(bigQueryTableDescription, etlFunc);

runSinkConnector(config, [converter], onError).then(config => {
    //runs forever until: config.stop();
});

/*
    this example would be able to store kafka message values
    that look like this (so completely unrelated to messages created by a default SourceTask)
    {
        payload: {
            id: 1,
            name: "first item",
            info: "some info"
        },
        type: "publish"
    }
*/

Use CLI

note: in BETA 🌱

npm install -g bigquery-kafka-connect
# run source etl: bigquery -> kafka
nkc-bigquery-source --help
# run sink etl: kafka -> bigquery
nkc-bigquery-sink --help

Config(uration)

const config = {
    kafka: {
        kafkaHost: "localhost:9092",
        logger: null,
        groupId: "kc-bigquery-test",
        clientName: "kc-bigquery-test-name",
        workerPerPartition: 1,
        options: {
            sessionTimeout: 8000,
            protocol: ["roundrobin"],
            fromOffset: "earliest", //latest
            fetchMaxBytes: 1024 * 100,
            fetchMinBytes: 1,
            fetchMaxWaitMs: 10,
            heartbeatInterval: 250,
            retryMinTimeout: 250,
            requireAcks: 1,
            //ackTimeoutMs: 100,
            //partitionerType: 3
        }
    },
    topic: "sc_test_topic",
    partitions: 1,
    maxTasks: 1,
    pollInterval: 2000,
    produceKeyed: true,
    produceCompressionType: 0,
    connector: {
        batchSize: 500,
        maxPollCount: 500,
        projectId: "bq-project-id",
        dataset: "bq_dataset",
        table: "bq_table",
        idColumn: "id"
    },
    http: {
        port: 3149,
        middlewares: []
    },
    enableMetrics: true,
    batch: {
        batchSize: 100, 
        commitEveryNBatch: 1, 
        concurrency: 1,
        commitSync: true
    }
};

Native Client Config(uration)

const config = {
    kafka: {
        noptions: {
            "metadata.broker.list": "localhost:9092",
            "group.id": "kc-bigquery-test",
            "enable.auto.commit": false,
            "debug": "all",
            "event_cb": true,
            "client.id": "kc-bigquery-test-name"
        },
        tconf: {
            "auto.offset.reset": "earliest",
            "request.required.acks": 1
        }
    },
    topic: "sc_test_topic",
    partitions: 1,
    maxTasks: 1,
    pollInterval: 2000,
    produceKeyed: true,
    produceCompressionType: 0,
    connector: {
        batchSize: 500,
        maxPollCount: 500,
        projectId: "bq-project-id",
        dataset: "bq_dataset",
        table: "bq_table",
        idColumn: "id"
    },
    http: {
        port: 3149,
        middlewares: []
    },
    enableMetrics: true
};
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].