All Projects → Awesome Spark → Similar Projects or Alternatives

200 Open source projects that are alternatives of or similar to Awesome Spark

Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-61.07%)
Mutual labels:  apache-spark
ai-deployment
关注AI模型上线、模型部署
Stars: ✭ 149 (-85.96%)
Mutual labels:  pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-61.73%)
Mutual labels:  pyspark
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-96.8%)
Mutual labels:  pyspark
Dblink
Distributed Bayesian Entity Resolution in Apache Spark
Stars: ✭ 38 (-96.42%)
Mutual labels:  apache-spark
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-98.4%)
Mutual labels:  pyspark
Spark Structured Streaming Book
The Internals of Spark Structured Streaming
Stars: ✭ 371 (-65.03%)
Mutual labels:  apache-spark
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-95.57%)
Mutual labels:  pyspark
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (-12.44%)
Mutual labels:  apache-spark
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (-96.32%)
Mutual labels:  apache-spark
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-68.71%)
Mutual labels:  apache-spark
Spark As Service Using Embedded Server
This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server
Stars: ✭ 46 (-95.66%)
Mutual labels:  apache-spark
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-96.98%)
Mutual labels:  pyspark
net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Stars: ✭ 72 (-93.21%)
Mutual labels:  apache-spark
Coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+212.72%)
Mutual labels:  apache-spark
spark-sql-internals
The Internals of Spark SQL
Stars: ✭ 331 (-68.8%)
Mutual labels:  apache-spark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-97.83%)
Mutual labels:  pyspark
PysparkCheatsheet
PySpark Cheatsheet
Stars: ✭ 25 (-97.64%)
Mutual labels:  apache-spark
Mist
Serverless proxy for Spark cluster
Stars: ✭ 309 (-70.88%)
Mutual labels:  apache-spark
net.jgp.books.spark.ch07
Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stars: ✭ 13 (-98.77%)
Mutual labels:  apache-spark
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-98.11%)
Mutual labels:  pyspark
spark
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (-42.6%)
Mutual labels:  apache-spark
Morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-71.44%)
Mutual labels:  apache-spark
oshinko-s2i
This is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-98.49%)
Mutual labels:  pyspark
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-25.26%)
Mutual labels:  apache-spark
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-97.27%)
Mutual labels:  apache-spark
Sparkflow
Easy to use library to bring Tensorflow on Apache Spark
Stars: ✭ 282 (-73.42%)
Mutual labels:  apache-spark
Spark Sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (-0.57%)
Mutual labels:  apache-spark
Datahacksummit 2017
Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Stars: ✭ 30 (-97.17%)
Mutual labels:  apache-spark
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (-42.22%)
Mutual labels:  apache-spark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-97.64%)
Mutual labels:  pyspark
gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-98.21%)
Mutual labels:  apache-spark
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (-89.16%)
Mutual labels:  pyspark
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (-74.18%)
Mutual labels:  pyspark
cloud-integration
Spark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-98.11%)
Mutual labels:  apache-spark
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-31.39%)
Mutual labels:  apache-spark
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-98.21%)
Mutual labels:  pyspark
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-97.83%)
Mutual labels:  apache-spark
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-93.69%)
Mutual labels:  apache-spark
Spark Flamegraph
Easy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-97.17%)
Mutual labels:  apache-spark
BigCLAM-ApacheSpark
Overlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (-96.23%)
Mutual labels:  apache-spark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-97.64%)
Mutual labels:  pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-40.34%)
Mutual labels:  pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-96.8%)
Mutual labels:  pyspark
Spark Tda
SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-95.76%)
Mutual labels:  apache-spark
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-92.74%)
Mutual labels:  pyspark
flask-spark-docker
Just a boilerplate for PySpark and Flask
Stars: ✭ 32 (-96.98%)
Mutual labels:  pyspark
Flintrock
A command-line tool for launching Apache Spark clusters.
Stars: ✭ 568 (-46.47%)
Mutual labels:  apache-spark
OSCI
Open Source Contributor Index
Stars: ✭ 107 (-89.92%)
Mutual labels:  pyspark
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-93.21%)
Mutual labels:  pyspark
geospark
bring sf to spark in production
Stars: ✭ 53 (-95%)
Mutual labels:  apache-spark
spark-streaming-visualize
Simple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-98.49%)
Mutual labels:  apache-spark
kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Stars: ✭ 38 (-96.42%)
Mutual labels:  pyspark
SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (-87.75%)
Mutual labels:  apache-spark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-10.08%)
Mutual labels:  pyspark
Streaming Readings
Streaming System 相关的论文读物
Stars: ✭ 554 (-47.79%)
Mutual labels:  apache-spark
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+131.76%)
Mutual labels:  pyspark
osm-parquetizer
A converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (-93.31%)
Mutual labels:  apache-spark
sparklygraphs
Old repo for R interface for GraphFrames
Stars: ✭ 13 (-98.77%)
Mutual labels:  apache-spark
Openscoring
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Stars: ✭ 536 (-49.48%)
Mutual labels:  apache-spark
61-120 of 200 similar projects