Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-14.71%)

Mutual labels: hadoop, bigdata

Bigdata Notebook

Stars: ✭ 100 (+194.12%)

Mutual labels: hadoop, bigdata

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+64.71%)

Mutual labels: hadoop, bigdata

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-44.12%)

Mutual labels: hadoop, spark-streaming

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (-11.76%)

Mutual labels: hadoop, bigdata

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-50%)

Mutual labels: hadoop, greenplum

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+532.35%)

Mutual labels: hadoop, bigdata

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+2302.94%)

Mutual labels: hadoop, bigdata

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+32226.47%)

Mutual labels: hadoop, bigdata

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+8.82%)

Mutual labels: hadoop, spark-streaming

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-85.29%)

Mutual labels: hadoop, bigdata

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (+8.82%)

Mutual labels: hadoop, spark-streaming

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-17.65%)

Mutual labels: hadoop, bigdata

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+17570.59%)

Mutual labels: hadoop, bigdata

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+14.71%)

Mutual labels: hadoop, spark-sql

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+800%)

Mutual labels: hadoop, bigdata

Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

Stars: ✭ 18 (-47.06%)

Mutual labels: spark-streaming, spark-sql

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+67.65%)

Mutual labels: bigdata, spark-sql

dt-sql-parser

SQL Parsers for BigData, built with antlr4.

Stars: ✭ 135 (+297.06%)

Mutual labels: bigdata, spark-sql

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (+497.06%)

Mutual labels: hadoop, bigdata

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

Stars: ✭ 36 (+5.88%)

Mutual labels: spark-streaming, spark-sql

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+311.76%)

Mutual labels: bigdata, spark-streaming

Hadoop Attack Library

A collection of pentest tools and resources targeting Hadoop environments

Stars: ✭ 228 (+570.59%)

Mutual labels: hadoop, bigdata

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-61.76%)

Mutual labels: hadoop, bigdata

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (+117.65%)

Mutual labels: hadoop, bigdata

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+420.59%)

Mutual labels: hadoop, spark-streaming

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (+276.47%)

Mutual labels: hadoop

Deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…

Stars: ✭ 12,277 (+36008.82%)

Mutual labels: hadoop

Spydra

Ephemeral Hadoop clusters using Google Compute Platform

Stars: ✭ 128 (+276.47%)

Mutual labels: hadoop

Facebook Hive Udfs

Facebook's Hive UDFs

Stars: ✭ 213 (+526.47%)

Mutual labels: hadoop

Big Whale

Spark、Flink等离线任务的调度以及实时任务的监控

Stars: ✭ 163 (+379.41%)

Mutual labels: hadoop

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+276.47%)

Mutual labels: hadoop

Parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.

Stars: ✭ 125 (+267.65%)

Mutual labels: hadoop

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+373.53%)

Mutual labels: hadoop

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Stars: ✭ 122 (+258.82%)

Mutual labels: hadoop

Luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Stars: ✭ 15,226 (+44682.35%)

Mutual labels: hadoop

Hdfs Shell

HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS

Stars: ✭ 117 (+244.12%)

Mutual labels: hadoop

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+38008.82%)

Mutual labels: hadoop

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+4694.12%)

Mutual labels: hadoop

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (+241.18%)

Mutual labels: hadoop

Hadoop Common

Mirror of Apache Hadoop common

Stars: ✭ 155 (+355.88%)

Mutual labels: hadoop

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+235.29%)

Mutual labels: hadoop

1-60 of 436 similar projects

›

next*5