All Projects → DaFlow → Similar Projects or Alternatives

1117 Open source projects that are alternatives of or similar to DaFlow

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (+137.5%)

Mutual labels: avro, parquet

Ratatool

A tool for data sampling, data generation, and data diffing

Stars: ✭ 279 (+1062.5%)

Mutual labels: avro, parquet

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+2933.33%)

Mutual labels: apache-spark, avro

Etlbox

A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.

Stars: ✭ 203 (+745.83%)

Mutual labels: etl, etl-framework

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+3204.17%)

Mutual labels: apache-spark, etl-framework

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (+16.67%)

Mutual labels: avro, parquet

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+6741.67%)

Mutual labels: hadoop, parquet

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-12.5%)

Mutual labels: hive, etl-framework

Parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (+420.83%)

Mutual labels: hadoop, parquet

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

Stars: ✭ 41 (+70.83%)

Mutual labels: hive, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+525%)

Mutual labels: apache-spark, hadoop

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+133.33%)

Mutual labels: hive, hadoop

Bender

Bender - Serverless ETL Framework

Stars: ✭ 171 (+612.5%)

Mutual labels: etl, etl-framework

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-45.83%)

Mutual labels: apache-spark, hadoop

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+433.33%)

Mutual labels: apache-spark, hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-12.5%)

Mutual labels: hive, hadoop

dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

Stars: ✭ 57 (+137.5%)

Mutual labels: csv, etl

Elasticsearch loader

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Stars: ✭ 300 (+1150%)

Mutual labels: csv, parquet

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+1933.33%)

Mutual labels: hive, hadoop

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+24933.33%)

Mutual labels: hive, hadoop

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (+200%)

Mutual labels: hive, etl

TIL

Today I Learned

Stars: ✭ 43 (+79.17%)

Mutual labels: hive, hadoop

Wifi

基于wifi抓取信息的大数据查询分析系统

Stars: ✭ 93 (+287.5%)

Mutual labels: hive, hadoop

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (+241.67%)

Mutual labels: hive, hadoop

Ether sql

A python library to push ethereum blockchain data into an sql database.

Stars: ✭ 41 (+70.83%)

Mutual labels: csv, etl

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+425%)

Mutual labels: hive, hadoop

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+795.83%)

Mutual labels: apache-spark, hadoop

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+3883.33%)

Mutual labels: csv, etl

Etl with python

ETL with Python - Taught at DWH course 2017 (TAU)

Stars: ✭ 68 (+183.33%)

Mutual labels: csv, etl

Etl.net

Mass processing data with a complete ETL for .net developers

Stars: ✭ 129 (+437.5%)

Mutual labels: csv, etl

Metl

mito ETL tool

Stars: ✭ 153 (+537.5%)

Mutual labels: etl, etl-framework

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+2454.17%)

Mutual labels: apache-spark, hadoop

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (+1050%)

Mutual labels: apache-spark, parquet

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-20.83%)

Mutual labels: hadoop, parquet

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+362.5%)

Mutual labels: apache-spark, hadoop

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+504.17%)

Mutual labels: apache-spark, parquet

hive-metastore-client

A client for connecting and running DDLs on hive metastore.

Stars: ✭ 37 (+54.17%)

Mutual labels: hive, etl

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-12.5%)

Mutual labels: hive, etl

parquet-flinktacular

How to use Parquet in Flink

Stars: ✭ 29 (+20.83%)

Mutual labels: avro, parquet

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+508.33%)

Mutual labels: apache-spark, hadoop

link-move

A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.

Stars: ✭ 32 (+33.33%)

Mutual labels: etl, etl-framework

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+229.17%)

Mutual labels: hive, hadoop

BETL-old

BETL. Meta data driven ETL generation using T-SQL

Stars: ✭ 17 (-29.17%)

Mutual labels: etl, etl-framework

Csv2db

The CSV to database command line loader

Stars: ✭ 102 (+325%)

Mutual labels: csv, etl

Facebook Hive Udfs

Facebook's Hive UDFs

Stars: ✭ 213 (+787.5%)

Mutual labels: hive, hadoop

AirflowETL

Blog post on ETL pipelines with Airflow

Stars: ✭ 20 (-16.67%)

Mutual labels: etl, etl-pipeline

openmrs-fhir-analytics

A collection of tools for extracting FHIR resources and analytics services on top of that data.

Stars: ✭ 55 (+129.17%)

Mutual labels: etl, parquet

parquet-extra

A collection of Apache Parquet add-on modules

Stars: ✭ 30 (+25%)

Mutual labels: avro, parquet

seatunnel-example

seatunnel plugin developing examples.

Stars: ✭ 27 (+12.5%)

Mutual labels: etl-framework, etl-pipeline

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-33.33%)

Mutual labels: hive, hadoop

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+54.17%)

Mutual labels: hive, hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+412.5%)

Mutual labels: hive, hadoop

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (+20.83%)

Mutual labels: hive, hadoop

Transformalize

Configurable Extract, Transform, and Load

Stars: ✭ 125 (+420.83%)

Mutual labels: etl, etl-framework

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+425%)

Mutual labels: etl, etl-framework

Hive Jdbc Uber Jar

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version