LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-54.32%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-91.01%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-55.04%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-56.12%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-85.97%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-56.47%)
sliceboxMicroservice for safe sharing and easy access to medical images
Stars: ✭ 18 (-93.53%)
Alchemy给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Stars: ✭ 264 (-5.04%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-56.83%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+4210.43%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-94.24%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-58.99%)
akka-contextual-actorA really small library (just a few classes) which lets you trace your actors messages transparently propagating a common context together with your messages and adding the specified values to the MDC of the underlying logging framework.
Stars: ✭ 17 (-93.88%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-59.35%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-94.96%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-60.07%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+942.81%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-28.06%)
twitter-stream-api🐤 Another Twitter stream PHP library to retrieve filtered tweets on hot.
Stars: ✭ 11 (-96.04%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-60.79%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-61.15%)
makinageStream Processing Made Easy
Stars: ✭ 31 (-88.85%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-65.83%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-61.51%)
lila-wsLichess' websocket server
Stars: ✭ 99 (-64.39%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-62.23%)
typebusFramework for building distributed microserviceies in scala with akka-streams and kafka
Stars: ✭ 14 (-94.96%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+717.99%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-90.65%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+387.05%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-65.11%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-7.55%)
dlinkDinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Stars: ✭ 1,535 (+452.16%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.32%)
pigletA compiler for Pig Latin to Spark and Flink.
Stars: ✭ 23 (-91.73%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (-31.65%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-87.41%)
flink-connector-kudu基于Apache-bahir-kudu-connector的flink-connector-kudu,支持Flink1.11.x DynamicTableSource/Sink,支持Range分区等
Stars: ✭ 40 (-85.61%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-68.35%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-88.13%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-32.73%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-33.09%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-34.17%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-93.53%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-88.49%)
emmaA quotation-based Scala DSL for scalable data analysis.
Stars: ✭ 61 (-78.06%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+784.89%)