H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+2459.28%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+928.96%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+52.49%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-15.38%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+250.68%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-48.87%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-17.19%)
Voik♒︎ [WIP] An experimental ~distributed~ commit-log
Stars: ✭ 200 (-9.5%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-20.81%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-15.84%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+1211.76%)
BastionHighly-available Distributed Fault-tolerant Runtime
Stars: ✭ 2,333 (+955.66%)
GerapyDistributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
Stars: ✭ 2,601 (+1076.92%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-20.36%)
AtomnasCode for ICLR 2020 paper 'AtomNAS: Fine-Grained End-to-End Neural Architecture Search'
Stars: ✭ 197 (-10.86%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-2.71%)
BigbenBigBen - a generic, multi-tenant, time-based event scheduler and cron scheduling framework
Stars: ✭ 174 (-21.27%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (-7.24%)
LingvoLingvo
Stars: ✭ 2,361 (+968.33%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1039.37%)
LightgbmA fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Stars: ✭ 13,293 (+5914.93%)
TfmesosTensorflow in Docker on Mesos #tfmesos #tensorflow #mesos
Stars: ✭ 194 (-12.22%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+842.99%)
BitA tool for component-driven application development.
Stars: ✭ 14,443 (+6435.29%)
PlynxPLynx is a domain agnostic platform for managing reproducible experiments and data-oriented workflows.
Stars: ✭ 192 (-13.12%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-24.43%)
Improved Body PartsSimple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
Stars: ✭ 202 (-8.6%)
DiasporaA privacy-aware, distributed, open source social network.
Stars: ✭ 12,937 (+5753.85%)
PotteryRedis for humans. 🌎🌍🌏
Stars: ✭ 204 (-7.69%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+1013.12%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-9.5%)
DkerasDistributed Keras Engine, Make Keras faster with only one line of code.
Stars: ✭ 181 (-18.1%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-2.26%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-19%)
CookimDistributed web chat application base websocket built on akka.
Stars: ✭ 198 (-10.41%)
ScannerlThe modular distributed fingerprinting engine
Stars: ✭ 208 (-5.88%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (-21.27%)
DsockDistributed WebSocket broker
Stars: ✭ 197 (-10.86%)
Spoon🥄 A package for building specific Proxy Pool for different Sites.
Stars: ✭ 173 (-21.72%)
VernemqA distributed MQTT message broker based on Erlang/OTP. Built for high quality & Industrial use cases.
Stars: ✭ 2,628 (+1089.14%)
Idworkeridworker 是一个基于zookeeper和snowflake算法的分布式ID生成工具,通过zookeeper自动注册机器(最多1024台),无需手动指定workerId和datacenterId
Stars: ✭ 171 (-22.62%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+5455.2%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (-7.24%)
OnyxDistributed, masterless, high performance, fault tolerant data processing
Stars: ✭ 2,019 (+813.57%)
HerddbA JVM-embeddable Distributed Database
Stars: ✭ 192 (-13.12%)
PysrSimple, fast, and parallelized symbolic regression in Python/Julia via regularized evolution and simulated annealing
Stars: ✭ 213 (-3.62%)
Zi5bookbook.zi5.me全站kindle电子书籍爬取,按照作者书籍名分类,每本书有mobi和equb两种格式,采用分布式进行全站爬取
Stars: ✭ 191 (-13.57%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-26.24%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-25.79%)
OneflowOneFlow is a performance-centered and open-source deep learning framework.
Stars: ✭ 2,868 (+1197.74%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (-14.03%)
DopJavaScript implementation for Distributed Object Protocol
Stars: ✭ 163 (-26.24%)
ArewedistributedyetWebsite + Community effort to unlock the peer-to-peer web at arewedistributedyet.com ⚡🌐🔑
Stars: ✭ 189 (-14.48%)