Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-64.74%)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (+77.37%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+865.79%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (+66.32%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+5440%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-35.79%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+61.58%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-65.79%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1954.21%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+1194.74%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+59.47%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-66.32%)
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (+56.84%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-36.32%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+1521.58%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-66.84%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (+46.32%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-21.05%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+43.16%)
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-66.84%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (+37.37%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (+36.84%)
Silexsomething to help you spark
Stars: ✭ 61 (-67.89%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (+35.26%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-14.21%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-68.42%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-36.84%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-68.42%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-86.84%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-83.16%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-69.47%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+757.89%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-71.58%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-90.53%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-7.89%)
blogblog entries
Stars: ✭ 39 (-79.47%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-40%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-76.84%)
Docker HadoopA Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-71.58%)
pgvectorOpen-source vector similarity search for Postgres
Stars: ✭ 482 (+153.68%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+463.16%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-1.58%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-3.68%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-5.79%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+996.84%)
Knn MattingSource Code for KNN Matting, CVPR 2012 / TPAMI 2013. MATLAB code ready to run. Simple and robust implementation under 40 lines.
Stars: ✭ 130 (-31.58%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+221.58%)