All Projects → DaFlow → Similar Projects or Alternatives

1117 Open source projects that are alternatives of or similar to DaFlow

datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+62.5%)
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (+16.67%)
Mutual labels:  csv, hive, avro, etl, etl-framework
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1450%)
Mutual labels:  csv, avro, etl, parquet, etl-framework
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+483.33%)
Mutual labels:  hive, hadoop, etl, parquet
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-12.5%)
Mutual labels:  csv, etl, etl-framework, etl-pipeline
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+637.5%)
Mutual labels:  apache-spark, hadoop, avro, parquet
Pyetl
python ETL framework
Stars: ✭ 33 (+37.5%)
Mutual labels:  csv, hive, etl, etl-framework
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-12.5%)
Mutual labels:  csv, etl, etl-framework
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+25%)
Mutual labels:  csv, etl, parquet
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+58.33%)
Mutual labels:  etl, etl-framework, etl-pipeline
vixtract
www.vixtract.ru
Stars: ✭ 40 (+66.67%)
Mutual labels:  etl, etl-framework, etl-pipeline
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+4879.17%)
Mutual labels:  hive, hadoop, etl
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+920.83%)
Mutual labels:  csv, avro, parquet
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-16.67%)
Mutual labels:  etl, etl-framework, etl-pipeline
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2450%)
Mutual labels:  etl, etl-framework, etl-pipeline
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+141.67%)
Mutual labels:  csv, avro, parquet
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (+500%)
Mutual labels:  apache-spark, etl, etl-framework
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+1537.5%)
Mutual labels:  hadoop, avro, parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1591.67%)
Mutual labels:  hadoop, avro, parquet
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+2462.5%)
Mutual labels:  hive, hadoop, etl
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+358.33%)
Mutual labels:  hive, hadoop, avro
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-41.67%)
Mutual labels:  hive, hadoop, etl
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+6645.83%)
Mutual labels:  hive, hadoop, parquet
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+179.17%)
Mutual labels:  etl, etl-framework, etl-pipeline
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+1450%)
Mutual labels:  hive, hadoop, etl
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+7633.33%)
Mutual labels:  hadoop, etl-framework, etl-pipeline
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+383.33%)
Mutual labels:  hive, hadoop, etl
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-29.17%)
Mutual labels:  hive, hadoop, parquet
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+795.83%)
Mutual labels:  apache-spark, hadoop
hive-jdbc-driver
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
Stars: ✭ 31 (+29.17%)
Mutual labels:  hive, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+525%)
Mutual labels:  apache-spark, hadoop
dswarm
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+137.5%)
Mutual labels:  csv, etl
aaocp
一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+120.83%)
Mutual labels:  hive, hadoop
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+1150%)
Mutual labels:  csv, parquet
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (+504.17%)
Mutual labels:  apache-spark, parquet
DataProfiler
What's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+3412.5%)
Mutual labels:  csv, avro
dogETL
A lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-37.5%)
Mutual labels:  csv, etl
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (+70.83%)
Mutual labels:  csv, etl
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (+325%)
Mutual labels:  csv, etl
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+3883.33%)
Mutual labels:  csv, etl
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (+183.33%)
Mutual labels:  csv, etl
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (+437.5%)
Mutual labels:  csv, etl
openmrs-fhir-analytics
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Stars: ✭ 55 (+129.17%)
Mutual labels:  etl, parquet
Omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+516.67%)
Mutual labels:  csv, etl
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-16.67%)
Mutual labels:  etl, etl-pipeline
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (+54.17%)
Mutual labels:  hive, etl
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+54.17%)
Mutual labels:  hive, hadoop
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+412.5%)
Mutual labels:  hive, hadoop
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-20.83%)
Mutual labels:  hadoop, parquet
link-move
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (+33.33%)
Mutual labels:  etl, etl-framework
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+508.33%)
Mutual labels:  apache-spark, hadoop
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+229.17%)
Mutual labels:  hive, hadoop
seatunnel-example
seatunnel plugin developing examples.
Stars: ✭ 27 (+12.5%)
Mutual labels:  etl-framework, etl-pipeline
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-12.5%)
Mutual labels:  hive, hadoop
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (+25%)
Mutual labels:  avro, parquet
parquet-flinktacular
How to use Parquet in Flink
Stars: ✭ 29 (+20.83%)
Mutual labels:  avro, parquet
BETL-old
BETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-29.17%)
Mutual labels:  etl, etl-framework
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-33.33%)
Mutual labels:  hive, hadoop
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+170.83%)
Mutual labels:  hive, hadoop
columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (+16.67%)
Mutual labels:  avro, parquet
1-60 of 1117 similar projects