Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (+37.42%)
Ruby SparkRuby wrapper for Apache Spark
Stars: ✭ 221 (+35.58%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (+34.36%)
Spark ExcelA Spark plugin for reading Excel files via Apache POI
Stars: ✭ 216 (+32.52%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+32.52%)
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+25.77%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+25.77%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+1678.53%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+22.7%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+1295.09%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+16.56%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (+14.72%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (+14.11%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+12.27%)
RoaringbitmapA better compressed bitset in Java
Stars: ✭ 2,460 (+1409.2%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+7.98%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (+7.36%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (+6.75%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1444.79%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+1178.53%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (+2.45%)
Devops Bash Tools550+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Kafka, Docker, APIs, Hadoop, SQL, PostgreSQL, MySQL, Hive, Impala, Travis CI, Jenkins, Concourse, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LDAP, Code/Build Linting, pkg mgmt for Linux, Mac, Python, Perl, Ruby, NodeJS, Golang, Advanced dotfiles: .bashrc, .vimrc, .gitconfig, .screenrc, .tmux.conf, .psqlrc ...
Stars: ✭ 226 (+38.65%)
Hadoop Attack LibraryA collection of pentest tools and resources targeting Hadoop environments
Stars: ✭ 228 (+39.88%)
LuigiLuigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Stars: ✭ 15,226 (+9241.1%)
Hadoop ConnectorsLibraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
Stars: ✭ 218 (+33.74%)
CalciteApache Calcite
Stars: ✭ 2,816 (+1627.61%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+26.99%)
Awesome Learning实践源码库:https://github.com/jast90/bigdata 。 微信搜索Jast关注公众号,获取最新技术分享😯。
Stars: ✭ 197 (+20.86%)
NutchApache Nutch is an extensible and scalable web crawler
Stars: ✭ 2,277 (+1296.93%)
Hive Jdbc Uber JarHive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+15.34%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+8.59%)
Flink Boot懒松鼠Flink-Boot 脚手架让Flink全面拥抱Spring生态体系,使得开发者可以以Java WEB开发模式开发出分布式运行的流处理程序,懒松鼠让跨界变得更加简单。懒松鼠旨在让开发者以更底上手成本(不需要理解分布式计算的理论知识和Flink框架的细节)便可以快速编写业务代码实现。为了进一步提升开发者使用懒松鼠脚手架开发大型项目的敏捷的度,该脚手架默认集成Spring框架进行Bean管理,同时将微服务以及WEB开发领域中经常用到的框架集成进来,进一步提升开发速度。比如集成Mybatis ORM框架,Hibernate Validator校验框架,Spring Retry重试框架等,具体见下面的脚手架特性。
Stars: ✭ 209 (+28.22%)
Flink Recommandsystem Demo🚁🚀基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
Stars: ✭ 3,115 (+1811.04%)
Flink Sql CookbookThe Apache Flink SQL Cookbook is a curated collection of examples, patterns, and use cases of Apache Flink SQL. Many of the recipes are completely self-contained and can be run in Ververica Platform as is.
Stars: ✭ 189 (+15.95%)
Flink SpectorFramework for Apache Flink unit tests
Stars: ✭ 190 (+16.56%)
RegistrySchema Registry
Stars: ✭ 184 (+12.88%)
NussknackerProcess authoring tool for Apache Flink
Stars: ✭ 182 (+11.66%)
Flink Commodity Recommendation System🐳基于 Flink 的商品实时推荐系统。使用了 redis 缓存热点数据。当用户产生评分行为时,数据由 kafka 发送到 flink,根据用户历史评分行为进行实时和离线推荐。实时推荐包括:基于行为和实时热门,离线推荐包括:历史热门、历史优质商品和 itemcf 。
Stars: ✭ 167 (+2.45%)
FlinkxBased on Apache Flink. support data synchronization/integration and streaming SQL computation.
Stars: ✭ 2,651 (+1526.38%)
AlinkAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Stars: ✭ 2,936 (+1701.23%)