OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-29.51%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-17.49%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-30.05%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1275.96%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+893.99%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+902.73%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-30.6%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-10.38%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+1055.19%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-16.94%)
Spring Shiro SparkSpring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试
Stars: ✭ 114 (-37.7%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-3.83%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-31.69%)
AthenacliAthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Stars: ✭ 151 (-17.49%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-33.33%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+1169.4%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-33.88%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (+995.63%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+1038.8%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-34.43%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-19.67%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+790.71%)
Java Notes☕️ Java 基础 👫 面向对象思想✏️ 算法 📝 操作系统 ☁️ 网络 💾 数据库 🙊 Spring 💡 系统架构🐘大数据
Stars: ✭ 160 (-12.57%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-37.7%)
PoliAn easy-to-use BI server built for SQL lovers. Power data analysis in SQL and gain faster business insights.
Stars: ✭ 1,850 (+910.93%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-13.11%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+6531.69%)
Xlearning Xdmlextremely distributed machine learning
Stars: ✭ 113 (-38.25%)
Liteflowliteflow是一个基于任务版本来实现的分布式任务流调度系统
Stars: ✭ 112 (-38.8%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+851.91%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-38.8%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+743.72%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-22.95%)
ArchivesparkAn Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
Stars: ✭ 111 (-39.34%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-22.4%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+731.15%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+914.21%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-13.66%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-39.89%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-24.04%)
Books技术书籍等
Stars: ✭ 110 (-39.89%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-16.94%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (+819.13%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-40.44%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-40.44%)
Daudit🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Stars: ✭ 108 (-40.98%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-4.37%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-8.74%)