All Projects → Spark With Python → Similar Projects or Alternatives

9362 Open source projects that are alternatives of or similar to Spark With Python

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+14598.67%)

Mutual labels: spark, big-data, hadoop

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-6.67%)

Mutual labels: spark, apache-spark, apache

Databases workshop

RCS Intro to Databases workshop materials

Stars: ✭ 25 (-83.33%)

Mutual labels: jupyter-notebook, sql, database

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-62%)

Mutual labels: spark, big-data, hadoop

Docker Superset

Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]

Stars: ✭ 86 (-42.67%)

Mutual labels: sql, analytics, apache

Pyspark Setup Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Stars: ✭ 24 (-84%)

Mutual labels: jupyter-notebook, big-data, pyspark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (+64%)

Mutual labels: spark, analytics, big-data

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+44%)

Mutual labels: spark, big-data, pyspark

Datafusion

DataFusion has now been donated to the Apache Arrow project

Stars: ✭ 611 (+307.33%)

Mutual labels: dataframe, sql, spark

Duckdb

DuckDB is an in-process SQL OLAP Database Management System

Stars: ✭ 4,014 (+2576%)

Mutual labels: sql, analytics, database

Calcite

Apache Calcite

Stars: ✭ 2,816 (+1777.33%)

Mutual labels: sql, big-data, hadoop

Sciblog support

Support content for my blog

Stars: ✭ 694 (+362.67%)

Mutual labels: jupyter-notebook, analytics, big-data

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+994.67%)

Mutual labels: spark, big-data, hadoop

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+1047.33%)

Mutual labels: spark, analytics, apache-spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+1929.33%)

Mutual labels: dataframe, spark, big-data

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+242%)

Mutual labels: spark, analytics, hdfs

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+396.67%)

Mutual labels: jupyter-notebook, spark, big-data

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-84.67%)

Mutual labels: jupyter-notebook, spark, pyspark

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (-13.33%)

Mutual labels: sql, big-data, hadoop

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+2136.67%)

Mutual labels: big-data, apache-spark, pyspark

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-78.67%)

Mutual labels: big-data, apache-spark, hadoop

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-86.67%)

Mutual labels: spark, hadoop, hdfs

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-8.67%)

Mutual labels: spark, big-data, apache-spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-83.33%)

Mutual labels: spark, hadoop, pyspark

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-88.67%)

Mutual labels: big-data, hadoop, apache

Yandex Big Data Engineering

Stars: ✭ 17 (-88.67%)

Mutual labels: jupyter-notebook, spark, hdfs

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

Stars: ✭ 21 (-86%)

Mutual labels: spark, hadoop, database

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+532.67%)

Mutual labels: spark, hadoop, distributed-computing

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+557.33%)

Mutual labels: jupyter-notebook, spark, pyspark

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+2442%)

Mutual labels: spark, big-data, hadoop

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+536%)

Mutual labels: jupyter-notebook, spark, pyspark

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (+583.33%)

Mutual labels: big-data, hadoop, distributed-computing

Design Of Experiment Python

Design-of-experiment (DOE) generator for science, engineering, and statistics

Stars: ✭ 143 (-4.67%)

Mutual labels: dataframe, jupyter-notebook, analytics

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+166.67%)

Mutual labels: spark, analytics, hadoop

Eventql

Distributed "massively parallel" SQL query engine

Stars: ✭ 1,121 (+647.33%)

Mutual labels: sql, analytics, database

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+3905.33%)

Mutual labels: spark, hadoop, hdfs

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (+142%)

Mutual labels: sql, spark, analytics

Data Science Best Resources

Carefully curated resource links for data science in one place

Stars: ✭ 1,104 (+636%)

Mutual labels: sql, analytics, database

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (-58%)

Mutual labels: jupyter-notebook, spark, pyspark

Scriptis

Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.

Stars: ✭ 696 (+364%)

Mutual labels: sql, spark, pyspark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+3575.33%)

Mutual labels: spark, big-data, database

Beeva Best Practices

Best Practices and Style Guides in BEEVA

Stars: ✭ 335 (+123.33%)

Mutual labels: jupyter-notebook, analytics, big-data

Mobius

C# and F# language binding and extensions to Apache Spark