MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+6342.22%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+377.78%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (+84.44%)
MlflowOpen source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+24117.78%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (+26.67%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+211.11%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+233.33%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+7273.33%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-33.33%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+1140%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+22.22%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+5624.44%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+3724.44%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+4531.11%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (+91.11%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-71.11%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+582.22%)
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+475.56%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+833.33%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+831.11%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+1517.78%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+817.78%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+717.78%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1964.44%)
Spark NkpNatural Korean Processor for Apache Spark
Stars: ✭ 50 (+11.11%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+2564.44%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+477.78%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-68.89%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+3855.56%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+133.33%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+264.44%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+6746.67%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+1662.22%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+111.11%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+7355.56%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+146.67%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+448.89%)
Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (+397.78%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+637.78%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+1622.22%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-40%)
SkynetJavaScript implementation of simple multilayer perceptron (MLP)
Stars: ✭ 26 (-42.22%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+70162.22%)
MachinelearningcourseA collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-22.22%)
TribuoTribuo - A Java machine learning library
Stars: ✭ 882 (+1860%)