HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
SparktutorialSource code for James Lee's Aparch Spark with Java course
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
AlmondA Scala kernel for Jupyter
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Big Data🔧 Use dplyr to analyze Big Data 🐘
CuesheetA framework for writing Spark 2.x applications in a pretty way
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
MleapMLeap: Deploy ML Pipelines to Production
LeharVisualize data using relative ordering
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
SetlA simple Spark-powered ETL framework that just works 🍺
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Cleanframestype-class based data cleansing library for Apache Spark SQL
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Lpa DetectorOptimize and improve the Label propagation algorithm
LabsResearch on distributed system
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
W2vWord2Vec models with Twitter data using Spark. Blog:
RoffildlibraryLibrary for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS