Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-20.11%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-25.86%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-20.11%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-27.01%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+1048.28%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+6763.79%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+1235.06%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-26.44%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-15.52%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-18.97%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-28.16%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-5.75%)
PowderkegLive-coding the cluster!
Stars: ✭ 152 (-12.64%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+889.08%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-12.64%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+945.4%)
GlowAn open-source toolkit for large-scale genomic analysis
Stars: ✭ 159 (-8.62%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-27.59%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-15.52%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+901.15%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-29.31%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-12.64%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-18.39%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-6.32%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-19.54%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-11.49%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-20.11%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+946.55%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-12.07%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (-21.84%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-13.22%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-25.29%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+6955.75%)
Spylon KernelJupyter kernel for scala and spark
Stars: ✭ 129 (-25.86%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+954.6%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+843.68%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-8.62%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+1380.46%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-13.79%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-27.01%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-4.02%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-28.16%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-27.59%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1347.13%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+1097.7%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-9.2%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+6874.71%)