sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+1930%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+366.67%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (+126.67%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+93.33%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-3.33%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-40%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+56.67%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-46.67%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (+3.33%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+326.67%)
spark-connectorA connector for Apache Spark to access Exasol
Stars: ✭ 13 (-56.67%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+283.33%)
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (+30%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+723.33%)
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-33.33%)
PysparklingA pure Python implementation of Apache Spark's RDD and DStream interfaces.
Stars: ✭ 231 (+670%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-36.67%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+5833.33%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+616.67%)
Analytics ZooDistributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Stars: ✭ 2,448 (+8060%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+123.33%)
spark-streaming-visualizeSimple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-46.67%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+446.67%)
BigCLAM-ApacheSparkOverlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (+33.33%)
AlbedoA recommender system for discovering GitHub repos, built with Apache Spark
Stars: ✭ 149 (+396.67%)
OryxOryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Stars: ✭ 1,785 (+5850%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-13.33%)
Scalable Data ScienceScalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Stars: ✭ 142 (+373.33%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+5636.67%)
Docker SparkApache Spark docker image
Stars: ✭ 1,396 (+4553.33%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (+186.67%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+250%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+156.67%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (+256.67%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (+176.67%)
MlflowOpen source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+36226.67%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (+90%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+13.33%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-16.67%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+140%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+83.33%)
Sparkit LearnPySpark + Scikit-learn = Sparkit-learn
Stars: ✭ 1,073 (+3476.67%)
geosparkbring sf to spark in production
Stars: ✭ 53 (+76.67%)