DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-40%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-28.57%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+4853.47%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-7.35%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-42.45%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+927.76%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-43.27%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (-16.33%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+750.61%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-43.27%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (-2.86%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+4774.69%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+1083.27%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-31.84%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-47.35%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+602.45%)
Neo4j EtlData import from relational databases to Neo4j.
Stars: ✭ 165 (-32.65%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-47.76%)
NeoconsA feature rich idiomatic Clojure client for the Neo4J REST API
Stars: ✭ 198 (-19.18%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+642.45%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-33.06%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-48.16%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (-0.41%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-48.57%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+848.16%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-48.98%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+828.16%)
Spark Infotheoretic Feature SelectionThis package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.
Stars: ✭ 123 (-49.8%)
Neo4j 3d Force GraphExperiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph
Stars: ✭ 159 (-35.1%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+724.49%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (-10.61%)
TeddySpark Streaming监控平台,支持任务部署与告警、自启动
Stars: ✭ 120 (-51.02%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (-23.67%)
ElassandraElassandra = Elasticsearch + Apache Cassandra
Stars: ✭ 1,610 (+557.14%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-37.96%)
Cypher DslA Java DSL for the Cypher Query Language
Stars: ✭ 116 (-52.65%)
Spark LucenerddSpark RDD with Lucene's query and entity linkage capabilities
Stars: ✭ 114 (-53.47%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-37.14%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (-25.31%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-54.29%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-37.55%)
ElephasDistributed Deep learning with Keras & Spark
Stars: ✭ 1,521 (+520.82%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-11.84%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-38.37%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+988.98%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-12.24%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-26.94%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-37.96%)