All Projects → DaFlow → Similar Projects or Alternatives

1117 Open source projects that are alternatives of or similar to DaFlow

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+62.5%)

Mutual labels: apache-spark, hadoop, etl, etl-framework, etl-pipeline

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (+16.67%)

Mutual labels: csv, hive, avro, etl, etl-framework

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+1450%)

Mutual labels: csv, avro, etl, parquet, etl-framework

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+483.33%)

Mutual labels: hive, hadoop, etl, parquet

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-12.5%)

Mutual labels: csv, etl, etl-framework, etl-pipeline

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+637.5%)

Mutual labels: apache-spark, hadoop, avro, parquet

Pyetl

python ETL framework

Stars: ✭ 33 (+37.5%)

Mutual labels: csv, hive, etl, etl-framework

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-12.5%)

Mutual labels: csv, etl, etl-framework

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (+25%)

Mutual labels: csv, etl, parquet

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+58.33%)

Mutual labels: etl, etl-framework, etl-pipeline

vixtract

www.vixtract.ru

Stars: ✭ 40 (+66.67%)

Mutual labels: etl, etl-framework, etl-pipeline

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+4879.17%)

Mutual labels: hive, hadoop, etl

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (+920.83%)

Mutual labels: csv, avro, parquet

DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.

Stars: ✭ 20 (-16.67%)

Mutual labels: etl, etl-framework, etl-pipeline

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+2450%)

Mutual labels: etl, etl-framework, etl-pipeline

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+141.67%)

Mutual labels: csv, avro, parquet

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+500%)

Mutual labels: apache-spark, etl, etl-framework

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (+1537.5%)

Mutual labels: hadoop, avro, parquet

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1591.67%)

Mutual labels: hadoop, avro, parquet

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+2462.5%)

Mutual labels: hive, hadoop, etl

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (+358.33%)

Mutual labels: hive, hadoop, avro

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-41.67%)

Mutual labels: hive, hadoop, etl

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+6645.83%)

Mutual labels: hive, hadoop, parquet

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (+179.17%)

Mutual labels: etl, etl-framework, etl-pipeline

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+1450%)

Mutual labels: hive, hadoop, etl

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+7633.33%)

Mutual labels: hadoop, etl-framework, etl-pipeline

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (+383.33%)

Mutual labels: hive, hadoop, etl

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-29.17%)

Mutual labels: hive, hadoop, parquet

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+795.83%)

Mutual labels: apache-spark, hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (+29.17%)

Mutual labels: hive, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+525%)

Mutual labels: apache-spark, hadoop

dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

Stars: ✭ 57 (+137.5%)

Mutual labels: csv, etl

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+120.83%)

Mutual labels: hive, hadoop

Elasticsearch loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Stars: ✭ 300 (+1150%)

Mutual labels: csv, parquet

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+504.17%)

Mutual labels: apache-spark, parquet

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Stars: ✭ 843 (+3412.5%)

Mutual labels: csv, avro

dogETL

A lib to transform data from jdbc,csv,json to ecah other.

Stars: ✭ 15 (-37.5%)

Mutual labels: csv, etl

Ether sql

A python library to push ethereum blockchain data into an sql database.

Stars: ✭ 41 (+70.83%)

Mutual labels: csv, etl

Csv2db

The CSV to database command line loader

Stars: ✭ 102 (+325%)

Mutual labels: csv, etl

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+3883.33%)

Mutual labels: csv, etl

Etl with python

ETL with Python - Taught at DWH course 2017 (TAU)

Stars: ✭ 68 (+183.33%)

Mutual labels: csv, etl

Etl.net

Mass processing data with a complete ETL for .net developers

Stars: ✭ 129 (+437.5%)

Mutual labels: csv, etl

openmrs-fhir-analytics

A collection of tools for extracting FHIR resources and analytics services on top of that data.

Stars: ✭ 55 (+129.17%)

Mutual labels: etl, parquet

Omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

Stars: ✭ 148 (+516.67%)

Mutual labels: csv, etl

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-16.67%)

Mutual labels: etl, etl-pipeline

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+54.17%)

Mutual labels: hive, etl

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+54.17%)

Mutual labels: hive, hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+412.5%)

Mutual labels: hive, hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-20.83%)

Mutual labels: hadoop, parquet

link-move

A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.

Stars: ✭ 32 (+33.33%)

Mutual labels: etl, etl-framework

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+508.33%)

Mutual labels: apache-spark, hadoop

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines