⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (+241.18%)

Mutual labels: spark, hdfs

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+5482.35%)

Mutual labels: spark, mapreduce

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+5511.76%)

Mutual labels: jupyter-notebook, spark

Spark Scala Tutorial

A free tutorial for Apache Spark.

Stars: ✭ 907 (+5235.29%)

Mutual labels: jupyter-notebook, spark

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (+394.12%)

Mutual labels: jupyter-notebook, spark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (+417.65%)

Mutual labels: jupyter-notebook, spark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+764.71%)

Mutual labels: jupyter-notebook, spark

Data Science Cookbook

🎓 Jupyter notebooks from UFC data science course

Stars: ✭ 60 (+252.94%)

Mutual labels: jupyter-notebook, spark

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+117.65%)

Mutual labels: hdfs, mapreduce

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (+17.65%)

Mutual labels: spark, hdfs

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-17.65%)

Mutual labels: spark, hdfs

Helk

The Hunting ELK

Stars: ✭ 3,097 (+18117.65%)

Mutual labels: jupyter-notebook, spark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+100%)

Mutual labels: spark, mapreduce

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+1682.35%)

Mutual labels: jupyter-notebook, spark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (+2323.53%)

Mutual labels: jupyter-notebook, spark

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (+117.65%)

Mutual labels: spark, hdfs

Pucket

Bucketing and partitioning system for Parquet

Stars: ✭ 29 (+70.59%)

Mutual labels: spark, hdfs

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

Stars: ✭ 14 (-17.65%)

Mutual labels: jupyter-notebook, spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (+35.29%)

Mutual labels: jupyter-notebook, spark

Pixiedust

Python Helper library for Jupyter Notebooks

Stars: ✭ 998 (+5770.59%)

Mutual labels: jupyter-notebook, spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

Stars: ✭ 11 (-35.29%)

Mutual labels: spark, mapreduce

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+2770.59%)

Mutual labels: mapreduce, hdfs

W2v

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (+276.47%)

Mutual labels: jupyter-notebook, spark

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (+423.53%)

Mutual labels: jupyter-notebook, spark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (+270.59%)

Mutual labels: jupyter-notebook, spark

Data science blogs

A repository to keep track of all the code that I end up writing for my blog posts.

Stars: ✭ 139 (+717.65%)

Mutual labels: jupyter-notebook, spark

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (+558.82%)

Mutual labels: jupyter-notebook, spark

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+2894.12%)

Mutual labels: spark, mapreduce

Justenoughscalaforspark

A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

Stars: ✭ 538 (+3064.71%)

Mutual labels: jupyter-notebook, spark

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (+1258.82%)

Mutual labels: jupyter-notebook, spark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+1235.29%)

Mutual labels: jupyter-notebook, spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-23.53%)

Mutual labels: spark, hdfs

Spark Practice

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+1076.47%)

Mutual labels: jupyter-notebook, spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+1423.53%)

Mutual labels: jupyter-notebook, spark

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-5.88%)

Mutual labels: spark, hdfs

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+2288.24%)

Mutual labels: spark, hdfs

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+870.59%)

Mutual labels: jupyter-notebook, spark

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+2582.35%)

Mutual labels: spark, mapreduce

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+129594.12%)

Mutual labels: spark, mapreduce

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+2917.65%)

Mutual labels: spark, hdfs

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+35241.18%)

Mutual labels: spark, hdfs

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+9488.24%)

Mutual labels: hdfs, spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+5364.71%)

Mutual labels: spark, mapreduce

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+4058.82%)

Mutual labels: jupyter-notebook, spark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+2329.41%)

Mutual labels: jupyter-notebook, spark

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+33170.59%)

Mutual labels: jupyter-notebook, spark

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+4282.35%)

Mutual labels: jupyter-notebook, spark

1-60 of 6375 similar projects

›

next*5