All Projects → Awesome Spark → Similar Projects or Alternatives

200 Open source projects that are alternatives of or similar to Awesome Spark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (-61.07%)

Mutual labels: apache-spark

ai-deployment

关注AI模型上线、模型部署

Stars: ✭ 149 (-85.96%)

Mutual labels: pyspark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-61.73%)

Mutual labels: pyspark

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-96.8%)

Mutual labels: pyspark

Dblink

Distributed Bayesian Entity Resolution in Apache Spark

Stars: ✭ 38 (-96.42%)

Mutual labels: apache-spark

machine-learning-course

Machine Learning Course @ Santa Clara University

Stars: ✭ 17 (-98.4%)

Mutual labels: pyspark

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (-65.03%)

Mutual labels: apache-spark

DataEngineering

This repo contains commands that data engineers use in day to day work.

Stars: ✭ 47 (-95.57%)

Mutual labels: pyspark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (-12.44%)

Mutual labels: apache-spark

spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Stars: ✭ 39 (-96.32%)

Mutual labels: apache-spark

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-68.71%)

Mutual labels: apache-spark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-95.66%)

Mutual labels: apache-spark

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (-96.98%)

Mutual labels: pyspark

net.jgp.books.spark.ch01

Spark in Action, 2nd edition - chapter 1 - Introduction

Stars: ✭ 72 (-93.21%)

Mutual labels: apache-spark

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+212.72%)

Mutual labels: apache-spark

spark-sql-internals

The Internals of Spark SQL

Stars: ✭ 331 (-68.8%)

Mutual labels: apache-spark

Spark Tdd Example

A simple Spark TDD example

Stars: ✭ 23 (-97.83%)

Mutual labels: pyspark

PysparkCheatsheet

PySpark Cheatsheet

Stars: ✭ 25 (-97.64%)

Mutual labels: apache-spark

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (-70.88%)

Mutual labels: apache-spark

net.jgp.books.spark.ch07

Spark in Action, 2nd edition - chapter 7 - Ingestion from files

Stars: ✭ 13 (-98.77%)

Mutual labels: apache-spark

pyspark-for-data-processing

Code for my presentation: Using PySpark to Process Boat Loads of Data

Stars: ✭ 20 (-98.11%)

Mutual labels: pyspark

spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/

Stars: ✭ 609 (-42.6%)

Mutual labels: apache-spark

Morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Stars: ✭ 303 (-71.44%)

Mutual labels: apache-spark

oshinko-s2i

This is a place to put s2i images and utilities for spark application builders for openshift

Stars: ✭ 16 (-98.49%)

Mutual labels: pyspark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (-25.26%)

Mutual labels: apache-spark

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

Stars: ✭ 29 (-97.27%)

Mutual labels: apache-spark

Sparkflow

Easy to use library to bring Tensorflow on Apache Spark

Stars: ✭ 282 (-73.42%)

Mutual labels: apache-spark

Spark Sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Stars: ✭ 1,055 (-0.57%)

Mutual labels: apache-spark

Datahacksummit 2017

Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark

Stars: ✭ 30 (-97.17%)

Mutual labels: apache-spark

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (-42.22%)

Mutual labels: apache-spark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (-97.64%)

Mutual labels: pyspark

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-98.21%)

Mutual labels: apache-spark

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (-89.16%)

Mutual labels: pyspark

Tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Stars: ✭ 274 (-74.18%)

Mutual labels: pyspark

cloud-integration

Spark cloud integration: tests, cloud committers and more

Stars: ✭ 20 (-98.11%)

Mutual labels: apache-spark

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (-31.39%)

Mutual labels: apache-spark

databricks-notebooks

Collection of Databricks and Jupyter Notebooks

Stars: ✭ 19 (-98.21%)

Mutual labels: pyspark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-97.83%)

Mutual labels: apache-spark

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (-93.69%)

Mutual labels: apache-spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (-97.17%)

Mutual labels: apache-spark

BigCLAM-ApacheSpark

Overlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark

Stars: ✭ 40 (-96.23%)

Mutual labels: apache-spark

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-97.64%)

Mutual labels: pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (-40.34%)

Mutual labels: pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-96.8%)

Mutual labels: pyspark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-95.76%)

Mutual labels: apache-spark

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (-92.74%)

Mutual labels: pyspark

flask-spark-docker

Just a boilerplate for PySpark and Flask

Stars: ✭ 32 (-96.98%)

Mutual labels: pyspark

Flintrock

A command-line tool for launching Apache Spark clusters.

Stars: ✭ 568 (-46.47%)

Mutual labels: apache-spark

OSCI

Open Source Contributor Index

Stars: ✭ 107 (-89.92%)

Mutual labels: pyspark

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Stars: ✭ 72 (-93.21%)

Mutual labels: pyspark

geospark

bring sf to spark in production

Stars: ✭ 53 (-95%)

Mutual labels: apache-spark

spark-streaming-visualize

Simple demonstration of how to build a complex real time machine learning visualization tool.

Stars: ✭ 16 (-98.49%)

Mutual labels: apache-spark

kafka-twitter-spark-streaming

Counting Tweets Per User in Real-Time

Stars: ✭ 38 (-96.42%)

Mutual labels: pyspark

SANSA-Stack

Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/

Stars: ✭ 130 (-87.75%)

Mutual labels: apache-spark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (-10.08%)

Mutual labels: pyspark

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (-47.79%)

Mutual labels: apache-spark

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,459 (+131.76%)

Mutual labels: pyspark

osm-parquetizer

A converter for the OSM PBFs to Parquet files

Stars: ✭ 71 (-93.31%)

Mutual labels: apache-spark

sparklygraphs

Old repo for R interface for GraphFrames

Stars: ✭ 13 (-98.77%)

Mutual labels: apache-spark

Openscoring

REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models

Stars: ✭ 536 (-49.48%)

Mutual labels: apache-spark

61-120 of 200 similar projects

‹

›