All Projects → sparsecode → DaFlow

sparsecode / DaFlow

Licence: other
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Programming Languages

scala
5932 projects
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to DaFlow

datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+62.5%)
Mutual labels:  apache-spark, hadoop, etl, etl-framework, etl-pipeline
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (+16.67%)
Mutual labels:  csv, hive, avro, etl, etl-framework
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1450%)
Mutual labels:  csv, avro, etl, parquet, etl-framework
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-12.5%)
Mutual labels:  csv, etl, etl-framework, etl-pipeline
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+637.5%)
Mutual labels:  apache-spark, hadoop, avro, parquet
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+483.33%)
Mutual labels:  hive, hadoop, etl, parquet
Pyetl
python ETL framework
Stars: ✭ 33 (+37.5%)
Mutual labels:  csv, hive, etl, etl-framework
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-12.5%)
Mutual labels:  csv, etl, etl-framework
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+58.33%)
Mutual labels:  etl, etl-framework, etl-pipeline
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+383.33%)
Mutual labels:  hive, hadoop, etl
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+25%)
Mutual labels:  csv, etl, parquet
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+179.17%)
Mutual labels:  etl, etl-framework, etl-pipeline
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (+500%)
Mutual labels:  apache-spark, etl, etl-framework
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2450%)
Mutual labels:  etl, etl-framework, etl-pipeline
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+141.67%)
Mutual labels:  csv, avro, parquet
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+920.83%)
Mutual labels:  csv, avro, parquet
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+4879.17%)
Mutual labels:  hive, hadoop, etl
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+358.33%)
Mutual labels:  hive, hadoop, avro
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-16.67%)
Mutual labels:  etl, etl-framework, etl-pipeline
vixtract
www.vixtract.ru
Stars: ✭ 40 (+66.67%)
Mutual labels:  etl, etl-framework, etl-pipeline

#DaFlow [Data Flow(ETL) Framework]

Build Status License codecov Code Climate

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].