50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+3037.04%)

Mutual labels: spark, hbase

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+496.3%)

Mutual labels: spark, hbase

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (+37.04%)

Mutual labels: spark, hbase

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+1277.78%)

Mutual labels: spark, hbase

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (+29.63%)

Mutual labels: spark, hbase

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+5981.48%)

Mutual labels: spark, hbase

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+22151.85%)

Mutual labels: spark, hbase

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (+203.7%)

Mutual labels: spark, hbase

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (+314.81%)

Mutual labels: spark, hbase

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (+33.33%)

Mutual labels: spark, hbase

Hbase Rdd

Spark RDD to read, write and delete from HBase

Stars: ✭ 277 (+925.93%)

Mutual labels: spark, hbase

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+40607.41%)

Mutual labels: spark, hbase

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+42040.74%)

Mutual labels: spark, hbase

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+2959.26%)

Mutual labels: spark, hbase

Spring Boot Quick

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌

Stars: ✭ 1,819 (+6637.04%)

Mutual labels: spark, hbase

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-48.15%)

Mutual labels: spark, hbase

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+700%)

Mutual labels: spark, hbase

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1403.7%)

Mutual labels: spark, hbase

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+1588.89%)

Mutual labels: spark, hbase

Stream Reactor

Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.

Stars: ✭ 753 (+2688.89%)

Mutual labels: hbase

Chronicler

Scala toolchain for InfluxDB

Stars: ✭ 24 (-11.11%)

Mutual labels: spark

Coding Now

学习记录的一些笔记，以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等

Stars: ✭ 750 (+2677.78%)

Mutual labels: spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+2659.26%)

Mutual labels: spark

Mlfeature

Feature engineering toolkit for Spark MLlib.

Stars: ✭ 12 (-55.56%)

Mutual labels: spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-14.81%)

Mutual labels: spark

Sparkctr

CTR prediction model based on spark(LR, GBDT, DNN)

Stars: ✭ 740 (+2640.74%)

Mutual labels: spark

Cdhproject

hadoop各组件使用，持续更新

Stars: ✭ 733 (+2614.81%)

Mutual labels: spark

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

Stars: ✭ 23 (-14.81%)

Mutual labels: spark

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+2596.3%)

Mutual labels: spark

Frameless

Expressive types for Spark.

Stars: ✭ 717 (+2555.56%)

Mutual labels: spark

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-48.15%)

Mutual labels: spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-59.26%)

Mutual labels: spark

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+3292.59%)

Mutual labels: spark

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (+2514.81%)

Mutual labels: spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+2518.52%)

Mutual labels: spark

Spark Scala Tutorial

A free tutorial for Apache Spark.

Stars: ✭ 907 (+3259.26%)

Mutual labels: spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+2477.78%)

Mutual labels: spark

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+2288.89%)

Mutual labels: spark

Sparkjni

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-59.26%)

Mutual labels: spark

Node Thrift2 Hbase

An HBase thrift wrapper for Node.js

Stars: ✭ 18 (-33.33%)

Mutual labels: hbase

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+2244.44%)

Mutual labels: spark

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+2222.22%)

Mutual labels: spark

Yandex Big Data Engineering

Stars: ✭ 17 (-37.04%)

Mutual labels: spark

Dev Setup

macOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.

Stars: ✭ 5,590 (+20603.7%)

Mutual labels: spark

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-22.22%)

Mutual labels: spark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-48.15%)

Mutual labels: spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+20848.15%)

Mutual labels: spark

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-40.74%)

Mutual labels: spark

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+2162.96%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.