Top 231 hadoop open source projects

wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
presto
Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
hadoop-ecosystem
Visualizations of the Hadoop Ecosystem
liquibase-impala
Liquibase extension to add Impala Database support
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
learning-spark
Tidy up Spark and Hadoop tutorials.
oci-cloudera
Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
jmx exporter-cloudera-hadoop
Prometheus jmx_exporter configurations for Cloudera Hadoop
skein
A tool and library for easily deploying applications on Apache YARN
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
corc
An ORC File Scheme for the Cascading data processing platform.
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
disk
基于hadoop+hbase+springboot实现分布式网盘系统
big-data-exploration
[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
LogAnalyzeHelper
论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
✭ 33
javahadoop
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
HDFS-Netdisc
基于Hadoop的分布式云存储系统 🌴
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
webhdfs
Node.js WebHDFS REST API client
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
TonY
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
JavaFramework
Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
beanszoo
Distributed Java micro-services using ZooKeeper
orion
Management and automation platform for Stateful Distributed Systems
hadoop-ansible
Install hadoop cluster with ansible
RecommendationEngine
Source code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"
ambari-hdp-docker
Dockerfiles and Docker Compose for HDP 2.6 with Blueprints
phoenix
Apache Phoenix / Hbase Spring Boot Microservices
docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
181-231 of 231 hadoop projects