Alternatives and detailed information of DaFlow

sparsecode / DaFlow

Licence: other

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Programming Languages

scala

5932 projects

shell

77523 projects

Dockerfile

14818 projects

Projects that are alternatives of or similar to DaFlow

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+62.5%)

Mutual labels: apache-spark, hadoop, etl, etl-framework, etl-pipeline

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (+16.67%)

Mutual labels: csv, hive, avro, etl, etl-framework

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+1450%)

Mutual labels: csv, avro, etl, parquet, etl-framework

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-12.5%)

Mutual labels: csv, etl, etl-framework, etl-pipeline

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+637.5%)

Mutual labels: apache-spark, hadoop, avro, parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+483.33%)

Mutual labels: hive, hadoop, etl, parquet

Pyetl

python ETL framework

Stars: ✭ 33 (+37.5%)

Mutual labels: csv, hive, etl, etl-framework

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-12.5%)

Mutual labels: csv, etl, etl-framework

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+58.33%)

Mutual labels: etl, etl-framework, etl-pipeline

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (+383.33%)

Mutual labels: hive, hadoop, etl

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (+25%)

Mutual labels: csv, etl, parquet

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (+179.17%)

Mutual labels: etl, etl-framework, etl-pipeline

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+500%)

Mutual labels: apache-spark, etl, etl-framework

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+2450%)

Mutual labels: etl, etl-framework, etl-pipeline

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+141.67%)

Mutual labels: csv, avro, parquet

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+920.83%)

Mutual labels: csv, avro, parquet

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+4879.17%)

Mutual labels: hive, hadoop, etl

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (+358.33%)

Mutual labels: hive, hadoop, avro

DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.

Stars: ✭ 20 (-16.67%)

Mutual labels: etl, etl-framework, etl-pipeline

vixtract

www.vixtract.ru

Stars: ✭ 40 (+66.67%)

Mutual labels: etl, etl-framework, etl-pipeline

View All Similar Projects ➔

#DaFlow [Data Flow(ETL) Framework]

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sparsecode / DaFlow

Programming Languages

Labels

Projects that are alternatives of or similar to DaFlow