All Projects → Pyspark Stubs → Similar Projects or Alternatives

200 Open source projects that are alternatives of or similar to Pyspark Stubs

Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-87.76%)
Mutual labels:  pyspark
HAL-9000
Automatically setup a productive development environment with Ansible on macOS
Stars: ✭ 72 (-26.53%)
Mutual labels:  apache-spark
Spark Nkp
Natural Korean Processor for Apache Spark
Stars: ✭ 50 (-48.98%)
Mutual labels:  apache-spark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-75.51%)
Mutual labels:  pyspark
Mlflow
Open source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+11020.41%)
Mutual labels:  apache-spark
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-74.49%)
Mutual labels:  apache-spark
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-76.53%)
Mutual labels:  pyspark
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2409.18%)
Mutual labels:  pyspark
Apache Spark Internals
The Internals of Apache Spark
Stars: ✭ 1,045 (+966.33%)
Mutual labels:  apache-spark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-67.35%)
Mutual labels:  pyspark
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (+690.82%)
Mutual labels:  apache-spark
spark-gradle-template
Apache Spark in your IDE with gradle
Stars: ✭ 39 (-60.2%)
Mutual labels:  apache-spark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-7.14%)
Mutual labels:  pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-48.98%)
Mutual labels:  pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+610.2%)
Mutual labels:  pyspark
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Stars: ✭ 71 (-27.55%)
Mutual labels:  pyspark
Spark Scala Maven Example
Example Maven configuration for a Spark, Scala project
Stars: ✭ 45 (-54.08%)
Mutual labels:  apache-spark
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+238.78%)
Mutual labels:  apache-spark
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-70.41%)
Mutual labels:  apache-spark
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+525.51%)
Mutual labels:  apache-spark
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-74.49%)
Mutual labels:  pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-35.71%)
Mutual labels:  pyspark
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-83.67%)
Mutual labels:  pyspark
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (+465.31%)
Mutual labels:  apache-spark
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+383.67%)
Mutual labels:  pyspark
Spark Examples
Spark examples
Stars: ✭ 41 (-58.16%)
Mutual labels:  apache-spark
spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
Stars: ✭ 129 (+31.63%)
Mutual labels:  apache-spark
Sparkle
Haskell on Apache Spark.
Stars: ✭ 419 (+327.55%)
Mutual labels:  apache-spark
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-82.65%)
Mutual labels:  pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1265.31%)
Mutual labels:  pyspark
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (-41.84%)
Mutual labels:  apache-spark
Spark Syntax
This is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+320.41%)
Mutual labels:  pyspark
pulsar-adapters
Apache Pulsar Adapters
Stars: ✭ 18 (-81.63%)
Mutual labels:  apache-spark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+906.12%)
Mutual labels:  pyspark
Awesome Kafka
A list about Apache Kafka
Stars: ✭ 397 (+305.1%)
Mutual labels:  apache-spark
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-70.41%)
Mutual labels:  pyspark
Awesome Pulsar
A curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-41.84%)
Mutual labels:  apache-spark
Sparkmeasure
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+275.51%)
Mutual labels:  apache-spark
hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (-68.37%)
Mutual labels:  apache-spark
parquet-dotnet
🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+103.06%)
Mutual labels:  apache-spark
Cloud Based Sql Engine Using Spark
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
Stars: ✭ 30 (-69.39%)
Mutual labels:  apache-spark
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-75.51%)
Mutual labels:  apache-spark
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-43.88%)
Mutual labels:  apache-spark
Spark Flamegraph
Easy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-69.39%)
Mutual labels:  apache-spark
Coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+3285.71%)
Mutual labels:  apache-spark
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+17.35%)
Mutual labels:  pyspark
Cuesheet
A framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-12.24%)
Mutual labels:  apache-spark
pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Stars: ✭ 24 (-75.51%)
Mutual labels:  pyspark
Learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+213.27%)
Mutual labels:  apache-spark
cloud-integration
Spark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-79.59%)
Mutual labels:  apache-spark
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-67.35%)
Mutual labels:  apache-spark
Datahacksummit 2017
Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Stars: ✭ 30 (-69.39%)
Mutual labels:  apache-spark
Mist
Serverless proxy for Spark cluster
Stars: ✭ 309 (+215.31%)
Mutual labels:  apache-spark
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-80.61%)
Mutual labels:  pyspark
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (-74.49%)
Mutual labels:  pyspark
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-31.63%)
Mutual labels:  apache-spark
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (-75.51%)
Mutual labels:  pyspark
Sparkit Learn
PySpark + Scikit-learn = Sparkit-learn
Stars: ✭ 1,073 (+994.9%)
Mutual labels:  apache-spark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+873.47%)
Mutual labels:  pyspark
Morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (+209.18%)
Mutual labels:  apache-spark
61-120 of 200 similar projects