waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
prestoTeradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
memex-gateGeneral Architecture for Text Engineering
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
oci-clouderaTerraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
skeinA tool and library for easily deploying applications on Apache YARN
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
disqA library for manipulating bioinformatics sequencing formats in Apache Spark
corcAn ORC File Scheme for the Cascading data processing platform.
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
disk基于hadoop+hbase+springboot实现分布式网盘系统
big-data-exploration[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
iisInformation Inference Service of the OpenAIRE system
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
openPDCOpen Source Phasor Data Concentrator
webhdfsNode.js WebHDFS REST API client
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
TonYTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
terasliceScalable data processing pipelines in JavaScript
JavaFrameworkSimple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
beanszooDistributed Java micro-services using ZooKeeper
orionManagement and automation platform for Stateful Distributed Systems
RecommendationEngineSource code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"
phoenixApache Phoenix / Hbase Spring Boot Microservices
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)