Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+9822.81%)

Mutual labels: spark, big-data, hadoop

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+6589.47%)

Mutual labels: spark, big-data, hadoop

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+94.74%)

Mutual labels: big-data, spark, hadoop

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+277.19%)

Mutual labels: spark, big-data, hadoop

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+19182.46%)

Mutual labels: spark, big-data, hadoop

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-77.19%)

Mutual labels: big-data, spark, hadoop

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-75.44%)

Mutual labels: big-data, spark, hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+38580.7%)

Mutual labels: spark, big-data, hadoop

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+1207.02%)

Mutual labels: spark, big-data

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+1031.58%)

Mutual labels: spark, hadoop

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+1333.33%)

Mutual labels: spark, hadoop

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+1349.12%)

Mutual labels: spark, hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-91.23%)

Mutual labels: big-data, hadoop

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+9571.93%)

Mutual labels: spark, big-data

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+1385.96%)

Mutual labels: spark, hadoop

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+1403.51%)

Mutual labels: spark, hadoop

Alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

Stars: ✭ 5,379 (+9336.84%)

Mutual labels: spark, hadoop

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+1507.02%)

Mutual labels: spark, hadoop

View All Similar Projects ➔

docker-spark-cluster

Build your own Spark cluster setup in Docker.
A multinode Spark installation where each node of the network runs in its own separated Docker container.
The installation takes care of the Hadoop & Spark configuration, providing:

a debian image with scala and java (scalabase image)
four fully configured Spark nodes running on Hadoop (sparkbase image):
- nodemaster (master node)
- node2 (slave)
- node3 (slave)
- node4 (slave)

Motivation

You can run Spark in a (boring) standalone setup or create your own network to hold a full cluster setup inside Docker instead.
I find the latter much more fun:

you can experiment with a more realistic network setup
tweak nodes configuration
simulate scalability, downtimes and rebalance by adding/removing nodes to the network automagically

There is a Medium article related to this: https://medium.com/@rubenafo/running-a-spark-cluster-setup-in-docker-containers-573c45cceabf

Installation

Clone this repository
cd scalabase
./build.sh # This builds the base java+scala debian container from openjdk9
cd ../spark
./build.sh # This builds sparkbase image
run ./cluster.sh deploy
The script will finish displaying the Hadoop and Spark admin URLs:
- Hadoop info @ nodemaster: http://172.18.1.1:8088/cluster
- Spark info @ nodemaster : http://172.18.1.1:8080/
- DFS Health @ nodemaster : http://172.18.1.1:9870/dfshealth.html

Options

cluster.sh stop   # Stop the cluster
cluster.sh start  # Start the cluster
cluster.sh info   # Shows handy URLs of running cluster

# Warning! This will remove everything from HDFS
cluster.sh deploy # Format the cluster and deploy images again

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 57

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗