Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-25.23%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+980.18%)
FlintWebex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Stars: ✭ 85 (-23.42%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (-27.03%)
Seldon ServerMachine Learning Platform and Recommendation Engine built on Kubernetes
Stars: ✭ 1,435 (+1192.79%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+8415.32%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-16.22%)
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
Stars: ✭ 101 (-9.01%)
ConiferCollect and revisit web pages.
Stars: ✭ 1,259 (+1034.23%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-2.7%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-12.61%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-28.83%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-0.9%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-32.43%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-17.12%)
Apache Spark Hands OnEducational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-33.33%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+1503.6%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-37.84%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-22.52%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-2.7%)
Replayweb.pageServerless Web Archive Replay directly in the browser
Stars: ✭ 84 (-24.32%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-24.32%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-26.13%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+1119.82%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+1009.91%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+10150.45%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-27.03%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-12.61%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-29.73%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (+0%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1105.41%)
Archivebox🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Stars: ✭ 12,383 (+11055.86%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-3.6%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+976.58%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-32.43%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-1.8%)
LabsResearch on distributed system
Stars: ✭ 73 (-34.23%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-35.14%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-5.41%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-20.72%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+1270.27%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+1572.07%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (-5.41%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-20.72%)