Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (-56.57%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (-80.84%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-81.07%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+262.81%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-96.24%)
Datahacksummit 2017Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Stars: ✭ 30 (-98.31%)
Metering OperatorThe Metering Operator is responsible for collecting metrics and other information about what's happening in a Kubernetes cluster, and providing a way to create reports on the collected data.
Stars: ✭ 320 (-82.02%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-96.46%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-58.15%)
Kube Cleanup OperatorKubernetes Operator to automatically delete completed Jobs and their Pods
Stars: ✭ 318 (-82.13%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-46.4%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-98.09%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (-59.72%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-95.56%)
konsumeratorKafka Consumer Operator. Kubernetes operator to manage consumers of unbalanced kafka topics with per-partition vertical autoscaling based on Prometheus metrics
Stars: ✭ 20 (-98.88%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (-39.89%)
shamashAutoscaling for Google Cloud Dataproc
Stars: ✭ 31 (-98.26%)
Search Ads Web ServiceOnline search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (-98.31%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-94.83%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (-64.78%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-99.1%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (-40.39%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-96.18%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+217.75%)
siddhi-operatorOperator allows you to run stream processing logic directly on a Kubernetes cluster
Stars: ✭ 16 (-99.1%)
Wlm OperatorSingularity implementation of k8s operator for interacting with SLURM.
Stars: ✭ 78 (-95.62%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (-83.93%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (-65.67%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-98.03%)
Spark-ArResources for Spark AR
Stars: ✭ 43 (-97.58%)
Mongo SparkThe MongoDB Spark Connector
Stars: ✭ 588 (-66.97%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-94.55%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (-82.25%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-96.24%)
PucketBucketing and partitioning system for Parquet
Stars: ✭ 29 (-98.37%)
CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-82.36%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (-97.13%)
FlintrockA command-line tool for launching Apache Spark clusters.
Stars: ✭ 568 (-68.09%)
samsahaiDependencies verification system with Kubernetes Operator
Stars: ✭ 66 (-96.29%)
Spark Sklearn(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (-40.73%)
controllerutilUtilities for writing Kubernetes controllers & operators
Stars: ✭ 34 (-98.09%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (-46.69%)
Shell OperatorShell-operator is a tool for running event-driven scripts in a Kubernetes cluster
Stars: ✭ 1,146 (-35.62%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-98.48%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (-82.58%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+119.27%)