All Categories → No Category → datalake

Top 12 datalake open source projects

Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
parquet-usql
A custom extractor designed to read parquet for Azure Data Lake Analytics
apiary
Apiary provides modules which can be combined to create a federated cloud data lake
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
apiary-data-lake
Terraform scripts for deploying Apiary Data Lake
dlink
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
1-12 of 12 datalake projects