Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+5523.53%)
go-bqloaderbqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-5.88%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+364.71%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+2023.53%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (+117.65%)
argonCampaign Manager 360 and Display & Video 360 Reports to BigQuery connector
Stars: ✭ 31 (+82.35%)
kafka-connect-datagenA Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+58.82%)
Kafka Connectequivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Stars: ✭ 102 (+500%)
bigquery-to-datastoreExport a whole BigQuery table to Google Datastore with Apache Beam/Google Dataflow
Stars: ✭ 56 (+229.41%)
bqvThe simplest tool to manage views of BigQuery.
Stars: ✭ 22 (+29.41%)
iris3An upgraded and improved version of the Iris automatic GCP-labeling project
Stars: ✭ 38 (+123.53%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (+17.65%)
bigtableTypeScript Bigtable Client with 🔋🔋 included.
Stars: ✭ 13 (-23.53%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (+464.71%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+747.06%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+76.47%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+1623.53%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+1282.35%)
Kafka UiOpen-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (+1252.94%)
kuromoji-for-bigqueryTokenize Japanese text on BigQuery with Kuromoji in Apache Beam/Google Dataflow at scale
Stars: ✭ 11 (-35.29%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+123.53%)
Mara Example Project 2An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+805.88%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (+641.18%)
ScioA Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+13117.65%)
MagnolifyA collection of Magnolia add-on modules
Stars: ✭ 81 (+376.47%)
maxwell-sinkconsume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (-5.88%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (+11.76%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+211.76%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+1341.18%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+723.53%)
Bitcoin EtlETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (+923.53%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+364.71%)
ob google-bigqueryThis service is meant to simplify running Google Cloud operations, especially BigQuery tasks. This means you do not have to worry about installation, configuration or ongoing maintenance related to an SDK environment. This can be helpful to those who would prefer to not to be responsible for those activities.
Stars: ✭ 43 (+152.94%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+129.41%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-5.88%)
pgsinkLogically replicate data out of Postgres into sinks (files, Google BigQuery, etc)
Stars: ✭ 53 (+211.76%)
functions-framework-phpFaaS (Function as a service) framework for writing portable PHP functions
Stars: ✭ 186 (+994.12%)
kafka-connect-jenkinsKafka Connect Connector for Jenkins Open Source Continuous Integration Tool
Stars: ✭ 29 (+70.59%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+88.24%)
activemodel-datastoreRuby on Rails with Active Model and Google Cloud Datastore. Extracted from Agrimatics Aero.
Stars: ✭ 47 (+176.47%)
libssh2.nimNim wrapper for libssh2
Stars: ✭ 25 (+47.06%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-17.65%)
emulator-toolsGoogle Cloud BigTable and PubSub emulator tools to make development a breeze
Stars: ✭ 16 (-5.88%)
rust-goauthCrate for authenticating Server to Server Apps for Google Cloud Engine.
Stars: ✭ 20 (+17.65%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+711.76%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+13782.35%)
cloudberryBig Data Visualization
Stars: ✭ 89 (+423.53%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+800%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (+5.88%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-11.76%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+294.12%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+588.24%)
google-cloudA collection of Google Cloud Platform (GCP) plugins
Stars: ✭ 34 (+100%)