A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-18.43%)

Mutual labels: apache-spark

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

Stars: ✭ 12 (-94.47%)

Mutual labels: pyspark

Hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

Stars: ✭ 108 (-50.23%)

Mutual labels: pyspark

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-88.94%)

Mutual labels: pyspark

Cluster Pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Stars: ✭ 23 (-89.4%)

Mutual labels: pyspark

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (-51.61%)

Mutual labels: apache-spark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+257.14%)

Mutual labels: apache-spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-89.4%)

Mutual labels: apache-spark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+220.74%)

Mutual labels: pyspark

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-74.65%)

Mutual labels: apache-spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-88.48%)

Mutual labels: pyspark

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+182.49%)

Mutual labels: apache-spark

Cc Pyspark

Process Common Crawl data with Python and Spark

Stars: ✭ 147 (-32.26%)

Mutual labels: pyspark

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+155.3%)

Mutual labels: apache-spark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+516.59%)

Mutual labels: pyspark

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (+93.09%)

Mutual labels: apache-spark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (-23.5%)

Mutual labels: pyspark

Spark Syntax

This is a repo documenting the best practices in PySpark.

Stars: ✭ 412 (+89.86%)

Mutual labels: pyspark

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-58.06%)

Mutual labels: pyspark

Awesome Kafka

A list about Apache Kafka

Stars: ✭ 397 (+82.95%)

Mutual labels: apache-spark

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (-33.18%)

Mutual labels: apache-spark

Sparkmeasure

This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.

Stars: ✭ 368 (+69.59%)

Mutual labels: apache-spark

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-60.37%)

Mutual labels: apache-spark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-84.33%)

Mutual labels: pyspark

Learningsparkv2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

Stars: ✭ 307 (+41.47%)

Mutual labels: apache-spark

Mlflow

Open source platform for the machine learning lifecycle

Stars: ✭ 10,898 (+4922.12%)

Mutual labels: apache-spark

Analytics Zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

Stars: ✭ 2,448 (+1028.11%)

Mutual labels: apache-spark

Spark Atlas Connector

A Spark Atlas connector to track data lineage in Apache Atlas

Stars: ✭ 160 (-26.27%)

Mutual labels: apache-spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-36.87%)

Mutual labels: apache-spark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-88.48%)

Mutual labels: pyspark

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (-33.64%)

Mutual labels: apache-spark

Spark Notebook

Interactive and Reactive Data Science using Scala and Spark.

Stars: ✭ 3,081 (+1319.82%)

Mutual labels: apache-spark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-70.97%)

Mutual labels: pyspark

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (+27.19%)

Mutual labels: apache-spark

Whylogs Java

Profile and monitor your ML data pipeline end-to-end

Stars: ✭ 164 (-24.42%)

Mutual labels: apache-spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+19.35%)

Mutual labels: apache-spark

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

Stars: ✭ 57 (-73.73%)

Mutual labels: apache-spark

HAL-9000

Automatically setup a productive development environment with Ansible on macOS

Stars: ✭ 72 (-66.82%)

Mutual labels: apache-spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-35.48%)

Mutual labels: apache-spark

Sparkit Learn

PySpark + Scikit-learn = Sparkit-learn

Stars: ✭ 1,073 (+394.47%)

Mutual labels: apache-spark

spark-streaming-visualize

Simple demonstration of how to build a complex real time machine learning visualization tool.

Stars: ✭ 16 (-92.63%)

Mutual labels: apache-spark

Spark Nkp

Natural Korean Processor for Apache Spark

Stars: ✭ 50 (-76.96%)

Mutual labels: apache-spark

Spark Tpc Ds Performance Test

Use the TPC-DS benchmark to test Spark SQL performance

Stars: ✭ 133 (-38.71%)

Mutual labels: apache-spark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+1033.18%)

Mutual labels: pyspark

Spark Sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Stars: ✭ 1,055 (+386.18%)

Mutual labels: apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-94.01%)

Mutual labels: apache-spark

kafka-compose

🎼 Docker compose files for various kafka stacks

Stars: ✭ 32 (-85.25%)

Mutual labels: pyspark

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (+381.57%)

Mutual labels: apache-spark

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

Stars: ✭ 42 (-80.65%)

Mutual labels: pyspark

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (-82.03%)

Mutual labels: apache-spark

Linkis

Stars: ✭ 2,323 (+970.51%)

Mutual labels: pyspark

Repo 2019

BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

Stars: ✭ 133 (-38.71%)

Mutual labels: pyspark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-78.8%)

Mutual labels: apache-spark

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (-68.66%)

Mutual labels: apache-spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (-76.96%)

Mutual labels: pyspark

61-120 of 200 similar projects

‹

›