HudiUpserts, Deletes And Incremental Processing on Big Data.
LeofsThe LeoFS Storage System
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
parquet-usqlA custom extractor designed to read parquet for Azure Data Lake Analytics
apiaryApiary provides modules which can be combined to create a federated cloud data lake
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
dlinkDinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.