Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+30102.74%)

Mutual labels: spark, big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-2.74%)

Mutual labels: spark, big-data

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+43212.33%)

Mutual labels: spark, big-data

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+394.52%)

Mutual labels: spark, big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (+594.52%)

Mutual labels: spark, big-data

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-80.82%)

Mutual labels: big-data, spark

Sparkjni

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-84.93%)

Mutual labels: spark, big-data

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+920.55%)

Mutual labels: spark, big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (+395.89%)

Mutual labels: spark, big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (+1442.47%)

Mutual labels: spark, big-data

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+5246.58%)

Mutual labels: spark, big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+5123.29%)

Mutual labels: spark, big-data

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (+252.05%)

Mutual labels: spark, big-data

Conjure Up

Deploying complex solutions, magically.

Stars: ✭ 454 (+521.92%)

Mutual labels: openstack, big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-82.19%)

Mutual labels: big-data, spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+52.05%)

Mutual labels: big-data, spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+7647.95%)

Mutual labels: spark, big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-21.92%)

Mutual labels: spark, big-data

View All Similar Projects ➔

fusionlab

Lab

网络ID：whoami

Lab

原创内容：
- BigData - 分布式系统设计与实现
- Cloudera - 企业级CDH数据产品
- Hortonworks - 企业级HDP数据产品
- JDataFlow - 企业级JDP数据产品
- Cloud - 云实践
- Web Design
- Architecture Design - 架构设计
- DataBase - 分布式数据库系统设计与实现
翻译内容：
- BigData - 分布式系统设计与实现
- Cloudera - 企业级CDH数据产品
- Hortonworks - 企业级HDP数据产品
- JDataFlow - 企业级JDP数据产品
- Cloud - 云实践
- Web Design
- Architecture Design - 架构设计
- DataBase - 分布式数据库系统设计与实现

2018 Planning

JDataFlow Platform - JDP企业级流分析平台

微信

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 73

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗