All Projects → DaFlow → Similar Projects or Alternatives

1117 Open source projects that are alternatives of or similar to DaFlow

Gcs Tools
GCS support for avro-tools, parquet-tools and protobuf
Stars: ✭ 57 (+137.5%)
Mutual labels:  avro, parquet
Ratatool
A tool for data sampling, data generation, and data diffing
Stars: ✭ 279 (+1062.5%)
Mutual labels:  avro, parquet
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+2933.33%)
Mutual labels:  apache-spark, avro
Etlbox
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (+745.83%)
Mutual labels:  etl, etl-framework
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+3204.17%)
Mutual labels:  apache-spark, etl-framework
columnify
Make record oriented data to columnar format.
Stars: ✭ 28 (+16.67%)
Mutual labels:  avro, parquet
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+6741.67%)
Mutual labels:  hadoop, parquet
ETL-Starter-Kit
📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-12.5%)
Mutual labels:  hive, etl-framework
Parquet4s
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Stars: ✭ 125 (+420.83%)
Mutual labels:  hadoop, parquet
EngineeringTeam
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
Stars: ✭ 41 (+70.83%)
Mutual labels:  hive, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+525%)
Mutual labels:  apache-spark, hadoop
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+133.33%)
Mutual labels:  hive, hadoop
Bender
Bender - Serverless ETL Framework
Stars: ✭ 171 (+612.5%)
Mutual labels:  etl, etl-framework
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-45.83%)
Mutual labels:  apache-spark, hadoop
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (+433.33%)
Mutual labels:  apache-spark, hadoop
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-12.5%)
Mutual labels:  hive, hadoop
dswarm
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+137.5%)
Mutual labels:  csv, etl
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (+1150%)
Mutual labels:  csv, parquet
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+1933.33%)
Mutual labels:  hive, hadoop
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+24933.33%)
Mutual labels:  hive, hadoop
Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (+200%)
Mutual labels:  hive, etl
TIL
Today I Learned
Stars: ✭ 43 (+79.17%)
Mutual labels:  hive, hadoop
Wifi
基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (+287.5%)
Mutual labels:  hive, hadoop
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (+241.67%)
Mutual labels:  hive, hadoop
Ether sql
A python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (+70.83%)
Mutual labels:  csv, etl
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+425%)
Mutual labels:  hive, hadoop
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+795.83%)
Mutual labels:  apache-spark, hadoop
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+3883.33%)
Mutual labels:  csv, etl
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (+183.33%)
Mutual labels:  csv, etl
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (+437.5%)
Mutual labels:  csv, etl
Metl
mito ETL tool
Stars: ✭ 153 (+537.5%)
Mutual labels:  etl, etl-framework
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+2454.17%)
Mutual labels:  apache-spark, hadoop
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (+1050%)
Mutual labels:  apache-spark, parquet
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-20.83%)
Mutual labels:  hadoop, parquet
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+362.5%)
Mutual labels:  apache-spark, hadoop
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (+504.17%)
Mutual labels:  apache-spark, parquet
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (+54.17%)
Mutual labels:  hive, etl
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-12.5%)
Mutual labels:  hive, etl
parquet-flinktacular
How to use Parquet in Flink
Stars: ✭ 29 (+20.83%)
Mutual labels:  avro, parquet
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+508.33%)
Mutual labels:  apache-spark, hadoop
link-move
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (+33.33%)
Mutual labels:  etl, etl-framework
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+229.17%)
Mutual labels:  hive, hadoop
BETL-old
BETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-29.17%)
Mutual labels:  etl, etl-framework
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (+325%)
Mutual labels:  csv, etl
Facebook Hive Udfs
Facebook's Hive UDFs
Stars: ✭ 213 (+787.5%)
Mutual labels:  hive, hadoop
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-16.67%)
Mutual labels:  etl, etl-pipeline
openmrs-fhir-analytics
A collection of tools for extracting FHIR resources and analytics services on top of that data.
Stars: ✭ 55 (+129.17%)
Mutual labels:  etl, parquet
parquet-extra
A collection of Apache Parquet add-on modules
Stars: ✭ 30 (+25%)
Mutual labels:  avro, parquet
seatunnel-example
seatunnel plugin developing examples.
Stars: ✭ 27 (+12.5%)
Mutual labels:  etl-framework, etl-pipeline
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-33.33%)
Mutual labels:  hive, hadoop
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+54.17%)
Mutual labels:  hive, hadoop
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+412.5%)
Mutual labels:  hive, hadoop
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (+20.83%)
Mutual labels:  hive, hadoop
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (+420.83%)
Mutual labels:  etl, etl-framework
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+425%)
Mutual labels:  etl, etl-framework
Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+683.33%)
Mutual labels:  hive, hadoop
Omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+516.67%)
Mutual labels:  csv, etl
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-12.5%)
Mutual labels:  hive, hadoop
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+170.83%)
Mutual labels:  hive, hadoop
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (+8.33%)
Mutual labels:  hadoop, avro
61-120 of 1117 similar projects