a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (+38.89%)

Mutual labels: spark-streaming, mapreduce

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-22.22%)

Mutual labels: hadoop, mapreduce

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (+0%)

Mutual labels: hadoop, mapreduce

Streamline

StreamLine - Streaming Analytics

Stars: ✭ 151 (+738.89%)

Mutual labels: storm, spark-streaming

Registry

Schema Registry

Stars: ✭ 184 (+922.22%)

Mutual labels: storm, spark-streaming

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (+5.56%)

Mutual labels: hadoop, spark-streaming

flokkr

Documentation placeholder and utilities for all the other containers.

Stars: ✭ 30 (+66.67%)

Mutual labels: hadoop, bigdata

hadoop-docker-lite

Docker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager

Stars: ✭ 24 (+33.33%)

Mutual labels: hadoop, storm

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+1488.89%)

Mutual labels: hadoop, mapreduce

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+1666.67%)

Mutual labels: hadoop, mapreduce

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+2611.11%)

Mutual labels: hadoop, mapreduce

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (+0%)

Mutual labels: hadoop, mapreduce

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+2122.22%)

Mutual labels: hadoop, bigdata

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+4438.89%)

Mutual labels: hadoop, bigdata

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (+61.11%)

Mutual labels: hadoop, bigdata

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (+116.67%)

Mutual labels: hadoop, mapreduce

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (+272.22%)

Mutual labels: hadoop, mapreduce

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (+411.11%)

Mutual labels: hadoop, mapreduce

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+2977.78%)

Mutual labels: storm, spark-streaming

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (+16.67%)

Mutual labels: hadoop, spark-streaming

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+2433.33%)

Mutual labels: storm, mapreduce

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+211.11%)

Mutual labels: hadoop, bigdata

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (+55.56%)

Mutual labels: hadoop, bigdata

Dpark

Python clone of Spark, a MapReduce alike framework in Python

Stars: ✭ 2,668 (+14722.22%)

Mutual labels: bigdata, mapreduce

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (+511.11%)

Mutual labels: hadoop, mapreduce

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-27.78%)

Mutual labels: hadoop, bigdata

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (+116.67%)

Mutual labels: hadoop, bigdata

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+1600%)

Mutual labels: hadoop, bigdata

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+600%)

Mutual labels: hadoop, bigdata

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+533.33%)

Mutual labels: hadoop, mapreduce

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

Stars: ✭ 65 (+261.11%)

Mutual labels: hadoop, bigdata

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+122388.89%)

Mutual labels: hadoop, mapreduce

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+33277.78%)

Mutual labels: hadoop, bigdata

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-72.22%)

Mutual labels: hadoop, bigdata

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+677.78%)

Mutual labels: bigdata, spark-streaming

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (+105.56%)

Mutual labels: hadoop, spark-streaming

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+5172.22%)

Mutual labels: hadoop, mapreduce

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (+311.11%)

Mutual labels: hadoop, bigdata

Storm Camel Example

Real-time analysis and visualization with Storm-AMQ-Camel-Websockets-Highcharts integration.

Stars: ✭ 28 (+55.56%)

Mutual labels: hadoop, storm

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+10211.11%)

Mutual labels: hadoop, spark-streaming

Recommendsys

推荐项目（实时推荐和离线推荐）

Stars: ✭ 198 (+1000%)

Mutual labels: hadoop, storm

Awesome Learning

实践源码库：https://github.com/jast90/bigdata 。微信搜索Jast关注公众号，获取最新技术分享😯。

Stars: ✭ 197 (+994.44%)

Mutual labels: hadoop, bigdata

Shifu

An end-to-end machine learning and data mining framework on Hadoop

Stars: ✭ 207 (+1050%)

Mutual labels: hadoop, bigdata

Hadoop Attack Library

A collection of pentest tools and resources targeting Hadoop environments

Stars: ✭ 228 (+1166.67%)

Mutual labels: hadoop, bigdata

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+1094.44%)

Mutual labels: hadoop, bigdata

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+883.33%)

Mutual labels: hadoop, spark-streaming

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+294.44%)

Mutual labels: bigdata, mapreduce

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+9461.11%)

Mutual labels: bigdata, spark-streaming

Stormtweetssentimentd3viz

Computes and visualizes the sentiment analysis of tweets of US States in real-time using Storm.