Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-32.63%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-19.47%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+857.37%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-7.37%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-33.16%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-20.53%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-33.68%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-12.11%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-34.21%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+865.79%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-35.79%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+1194.74%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-36.32%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-21.05%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-14.21%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-36.84%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+757.89%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-7.89%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-40%)
NeighborNearest neighbor search for Rails and Postgres
Stars: ✭ 114 (-40%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-40.53%)
NanopqPure python implementation of product quantization for nearest neighbor search
Stars: ✭ 145 (-23.68%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-41.58%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-2.11%)
Lambda ArchApplying Lambda Architecture with Spark, Kafka, and Cassandra.
Stars: ✭ 111 (-41.58%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-25.79%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-16.32%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-42.63%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-26.84%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-43.16%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1225.26%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+5888.42%)
LogigskA Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Stars: ✭ 107 (-43.68%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-16.32%)
SparktutorialSource code for James Lee's Aparch Spark with Java course
Stars: ✭ 105 (-44.74%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-26.84%)
TarsoslshA Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It implements Locality-sensitive Hashing (LSH) and multi index hashing for hamming space.
Stars: ✭ 179 (-5.79%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+858.42%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+612.63%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-16.84%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-48.95%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-28.42%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+604.21%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+6361.58%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-31.58%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-1.58%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-3.68%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-5.79%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+996.84%)