All Projects → tharwaninitin → etlflow

tharwaninitin / etlflow

Licence: Apache-2.0 license
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to etlflow

DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-36.84%)
Mutual labels:  etl, etl-framework, etl-pipeline
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-47.37%)
Mutual labels:  etl, etl-framework, etl-pipeline
vixtract
www.vixtract.ru
Stars: ✭ 40 (+5.26%)
Mutual labels:  etl, etl-framework, etl-pipeline
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+76.32%)
Mutual labels:  etl, etl-framework, etl-pipeline
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+39.47%)
Mutual labels:  bigquery, etl, gcp
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-44.74%)
Mutual labels:  etl, etl-framework, etl-pipeline
Bitcoin Etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (+357.89%)
Mutual labels:  bigquery, etl, gcp
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+2415.79%)
Mutual labels:  bigquery, etl, gcp
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+2.63%)
Mutual labels:  etl, etl-framework, etl-pipeline
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1510.53%)
Mutual labels:  etl, etl-framework, etl-pipeline
iris3
An upgraded and improved version of the Iris automatic GCP-labeling project
Stars: ✭ 38 (+0%)
Mutual labels:  bigquery, gcp
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-47.37%)
Mutual labels:  etl, etl-pipeline
link-move
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-15.79%)
Mutual labels:  etl, etl-framework
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-55.26%)
Mutual labels:  bigquery, etl
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-57.89%)
Mutual labels:  bigquery, gcp
seatunnel-example
seatunnel plugin developing examples.
Stars: ✭ 27 (-28.95%)
Mutual labels:  etl-framework, etl-pipeline
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+305.26%)
Mutual labels:  bigquery, etl
BETL-old
BETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-55.26%)
Mutual labels:  etl, etl-framework
bigflow
A Python framework for data processing on GCP.
Stars: ✭ 96 (+152.63%)
Mutual labels:  bigquery, gcp
Datashare Toolkit
DIY commercial datasets on Google Cloud Platform
Stars: ✭ 41 (+7.89%)
Mutual labels:  bigquery, gcp

EtlFlow

License EtlFlow CI Semantic Versioning Policy Check Maven Central javadoc

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Documentation

Library Documentation https://tharwaninitin.github.io/etlflow/site/docs

Examples

You can use this library in different ways mentioned below.

  • Core Module:
    Using this module you can use features of Step API into your project.
  • Spark Module (spark steps):
    Using this addon module along with core you can use Apache Spark steps into your project.
  • GCP Module (spark steps):
    Using this addon module along with core you can use GCP steps into your project.

Modules Dependency Graph

ModuleDepGraph

Scala Version Compatibility Matrix

Module Name Scala 2.12 Scala 2.13 Scala 3.1
Core
Db
Http
Email
Aws
Gcp
Redis
Spark

Requirements and Installation

This project is compiled with scala versions 2.12.15, 2.13.8, 3.1.1 Available via maven central. Add the latest release as a dependency to your project

Latest Version

SBT

libraryDependencies += "com.github.tharwaninitin" %% "etlflow-core" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-spark" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-cloud" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-http" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-redis" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-aws" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-gcp" % "x.x.x"
libraryDependencies += "com.github.tharwaninitin" %% "etlflow-email" % "x.x.x"

Maven

<dependency>
    <groupId>com.github.tharwaninitin</groupId>
    <artifactId>etlflow-core_2.12</artifactId>
    <version>x.x.x</version>
</dependency>

Contributions

Please feel free to add issues to report any bugs or to propose new features.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].