All Projects → Pyspark Stubs → Similar Projects or Alternatives

200 Open source projects that are alternatives of or similar to Pyspark Stubs

Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+982.65%)
Mutual labels:  apache-spark, pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-76.53%)
Mutual labels:  apache-spark, pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-71.43%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+224.49%)
Mutual labels:  apache-spark, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-60.2%)
Mutual labels:  apache-spark, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+17.35%)
Mutual labels:  apache-spark, pyspark
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-77.55%)
Mutual labels:  apache-spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+53.06%)
Mutual labels:  apache-spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+68.37%)
Mutual labels:  apache-spark, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (-43.88%)
Mutual labels:  apache-spark, pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (-14.29%)
Mutual labels:  apache-spark, pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (-69.39%)
Mutual labels:  apache-spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+2858.16%)
Mutual labels:  pyspark, apache-spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+13.27%)
Mutual labels:  apache-spark, pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (-20.41%)
Mutual labels:  apache-spark, pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+214.29%)
Mutual labels:  apache-spark, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+3323.47%)
Mutual labels:  apache-spark, pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+121.43%)
Mutual labels:  apache-spark, pyspark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-47.96%)
Mutual labels:  apache-spark, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-48.98%)
Mutual labels:  apache-spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-85.71%)
Mutual labels:  apache-spark, pyspark
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+847.96%)
Mutual labels:  apache-spark
Spark Sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (+976.53%)
Mutual labels:  apache-spark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-76.53%)
Mutual labels:  pyspark
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+709.18%)
Mutual labels:  apache-spark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-34.69%)
Mutual labels:  pyspark
Spark As Service Using Embedded Server
This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server
Stars: ✭ 46 (-53.06%)
Mutual labels:  apache-spark
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+642.86%)
Mutual labels:  apache-spark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-87.76%)
Mutual labels:  pyspark
Spark Nkp
Natural Korean Processor for Apache Spark
Stars: ✭ 50 (-48.98%)
Mutual labels:  apache-spark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-75.51%)
Mutual labels:  pyspark
Mlflow
Open source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+11020.41%)
Mutual labels:  apache-spark
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-76.53%)
Mutual labels:  pyspark
Apache Spark Internals
The Internals of Apache Spark
Stars: ✭ 1,045 (+966.33%)
Mutual labels:  apache-spark
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (+690.82%)
Mutual labels:  apache-spark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-7.14%)
Mutual labels:  pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+610.2%)
Mutual labels:  pyspark
Spark Scala Maven Example
Example Maven configuration for a Spark, Scala project
Stars: ✭ 45 (-54.08%)
Mutual labels:  apache-spark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+545.92%)
Mutual labels:  pyspark
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+525.51%)
Mutual labels:  apache-spark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-35.71%)
Mutual labels:  pyspark
Spark Tda
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-54.08%)
Mutual labels:  apache-spark
Flintrock
A command-line tool for launching Apache Spark clusters.
Stars: ✭ 568 (+479.59%)
Mutual labels:  apache-spark
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (+465.31%)
Mutual labels:  apache-spark
Spark Examples
Spark examples
Stars: ✭ 41 (-58.16%)
Mutual labels:  apache-spark
Openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Stars: ✭ 536 (+446.94%)
Mutual labels:  apache-spark
Sparkle
Haskell on Apache Spark.
Stars: ✭ 419 (+327.55%)
Mutual labels:  apache-spark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1265.31%)
Mutual labels:  pyspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-11.22%)
Mutual labels:  pyspark
Petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+1030.61%)
Mutual labels:  pyspark
Dblink
Distributed Bayesian Entity Resolution in Apache Spark
Stars: ✭ 38 (-61.22%)
Mutual labels:  apache-spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+321.43%)
Mutual labels:  apache-spark
Spark Syntax
This is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+320.41%)
Mutual labels:  pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+906.12%)
Mutual labels:  pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+314.29%)
Mutual labels:  pyspark
Awesome Kafka
A list about Apache Kafka
Stars: ✭ 397 (+305.1%)
Mutual labels:  apache-spark
Awesome Pulsar
A curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-41.84%)
Mutual labels:  apache-spark
Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-62.24%)
Mutual labels:  apache-spark
Spark Structured Streaming Book
The Internals of Spark Structured Streaming
Stars: ✭ 371 (+278.57%)
Mutual labels:  apache-spark
Sparkmeasure
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+275.51%)
Mutual labels:  apache-spark
1-60 of 200 similar projects