All Projects → Awesome Spark → Similar Projects or Alternatives

200 Open source projects that are alternatives of or similar to Awesome Spark

Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-97.93%)
Mutual labels:  apache-spark, pyspark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-95.19%)
Mutual labels:  apache-spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+173.23%)
Mutual labels:  pyspark, apache-spark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-85.86%)
Mutual labels:  apache-spark, pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-97.83%)
Mutual labels:  apache-spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-89.54%)
Mutual labels:  apache-spark, pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (-70.97%)
Mutual labels:  apache-spark, pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (-79.55%)
Mutual labels:  apache-spark, pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (-92.08%)
Mutual labels:  apache-spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (-84.45%)
Mutual labels:  apache-spark, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-95.29%)
Mutual labels:  apache-spark, pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (-92.65%)
Mutual labels:  apache-spark, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-94.82%)
Mutual labels:  apache-spark, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-96.32%)
Mutual labels:  apache-spark, pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (-97.17%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (-70.03%)
Mutual labels:  apache-spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-98.68%)
Mutual labels:  apache-spark, pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (-90.76%)
Mutual labels:  apache-spark, pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-97.36%)
Mutual labels:  apache-spark, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+216.21%)
Mutual labels:  apache-spark, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-89.16%)
Mutual labels:  apache-spark, pyspark
Spark Syntax
This is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (-61.17%)
Mutual labels:  pyspark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-98.87%)
Mutual labels:  pyspark
Awesome Kafka
A list about Apache Kafka
Stars: ✭ 397 (-62.58%)
Mutual labels:  apache-spark
Sparkmeasure
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (-65.32%)
Mutual labels:  apache-spark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (-7.07%)
Mutual labels:  pyspark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-97.74%)
Mutual labels:  pyspark
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-68.71%)
Mutual labels:  apache-spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-61.07%)
Mutual labels:  apache-spark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-61.73%)
Mutual labels:  pyspark
Dblink
Distributed Bayesian Entity Resolution in Apache Spark
Stars: ✭ 38 (-96.42%)
Mutual labels:  apache-spark
Spark Structured Streaming Book
The Internals of Spark Structured Streaming
Stars: ✭ 371 (-65.03%)
Mutual labels:  apache-spark
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (-12.44%)
Mutual labels:  apache-spark
Spark As Service Using Embedded Server
This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server
Stars: ✭ 46 (-95.66%)
Mutual labels:  apache-spark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-97.83%)
Mutual labels:  pyspark
Coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+212.72%)
Mutual labels:  apache-spark
Learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (-71.07%)
Mutual labels:  apache-spark
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-96.51%)
Mutual labels:  apache-spark
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-97.83%)
Mutual labels:  pyspark
Mist
Serverless proxy for Spark cluster
Stars: ✭ 309 (-70.88%)
Mutual labels:  apache-spark
Morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-71.44%)
Mutual labels:  apache-spark
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-25.26%)
Mutual labels:  apache-spark
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+190.39%)
Mutual labels:  apache-spark
Sparkflow
Easy to use library to bring Tensorflow on Apache Spark
Stars: ✭ 282 (-73.42%)
Mutual labels:  apache-spark
Spark Sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (-0.57%)
Mutual labels:  apache-spark
Spark Scala Maven Example
Example Maven configuration for a Spark, Scala project
Stars: ✭ 45 (-95.76%)
Mutual labels:  apache-spark
Cloud Based Sql Engine Using Spark
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Stars: ✭ 30 (-97.17%)
Mutual labels:  apache-spark
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (-26.96%)
Mutual labels:  apache-spark
Parquet Dotnet
🏐 Apache Parquet for modern .NET
Stars: ✭ 276 (-73.99%)
Mutual labels:  apache-spark
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (-74.18%)
Mutual labels:  pyspark
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-31.39%)
Mutual labels:  apache-spark
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (-75.59%)
Mutual labels:  apache-spark
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-97.83%)
Mutual labels:  apache-spark
Spark Flamegraph
Easy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-97.17%)
Mutual labels:  apache-spark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (-34.4%)
Mutual labels:  pyspark
HAL-9000
Automatically setup a productive development environment with Ansible on macOS
Stars: ✭ 72 (-93.21%)
Mutual labels:  apache-spark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-40.34%)
Mutual labels:  pyspark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-97.64%)
Mutual labels:  pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-96.8%)
Mutual labels:  pyspark
Spark Tda
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-95.76%)
Mutual labels:  apache-spark
1-60 of 200 similar projects