datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+62.5%)
Mutual labels: apache-spark, hadoop, etl, etl-framework, etl-pipeline
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (+16.67%)
Mutual labels: csv, hive, avro, etl, etl-framework
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1450%)
Mutual labels: csv, avro, etl, parquet, etl-framework
redis-connect-distReal-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-12.5%)
Mutual labels: csv, etl, etl-framework, etl-pipeline
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+637.5%)
Mutual labels: apache-spark, hadoop, avro, parquet
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+483.33%)
Mutual labels: hive, hadoop, etl, parquet
Pyetlpython ETL framework
Stars: ✭ 33 (+37.5%)
Mutual labels: csv, hive, etl, etl-framework
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-12.5%)
Mutual labels: csv, etl, etl-framework
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+58.33%)
Mutual labels: etl, etl-framework, etl-pipeline
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+383.33%)
Mutual labels: hive, hadoop, etl
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+25%)
Mutual labels: csv, etl, parquet
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+179.17%)
Mutual labels: etl, etl-framework, etl-pipeline
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+500%)
Mutual labels: apache-spark, etl, etl-framework
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2450%)
Mutual labels: etl, etl-framework, etl-pipeline
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+141.67%)
Mutual labels: csv, avro, parquet
Vscode Data PreviewData Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (+920.83%)
Mutual labels: csv, avro, parquet
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+4879.17%)
Mutual labels: hive, hadoop, etl
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+358.33%)
Mutual labels: hive, hadoop, avro
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-16.67%)
Mutual labels: etl, etl-framework, etl-pipeline
vixtractwww.vixtract.ru
Stars: ✭ 40 (+66.67%)
Mutual labels: etl, etl-framework, etl-pipeline