TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+750.61%)

Mutual labels: spark

Isolation Forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Stars: ✭ 139 (-43.27%)

Mutual labels: spark

Hadoop Docker

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Stars: ✭ 238 (-2.86%)

Mutual labels: spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-44.08%)

Mutual labels: spark

Spark Structured Streaming Examples

Spark Structured Streaming / Kafka / Cassandra / Elastic

Stars: ✭ 168 (-31.43%)

Mutual labels: spark

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Stars: ✭ 11,943 (+4774.69%)

Mutual labels: spark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+1083.27%)

Mutual labels: spark

Iot Traffic Monitor

Stars: ✭ 131 (-46.53%)

Mutual labels: spark

Geopyspark

GeoTrellis for PySpark

Stars: ✭ 167 (-31.84%)

Mutual labels: spark

Opaque

An encrypted data analytics platform

Stars: ✭ 129 (-47.35%)

Mutual labels: spark

Spark Workshop

Apache Spark™ and Scala Workshops

Stars: ✭ 224 (-8.57%)

Mutual labels: spark

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+602.45%)

Mutual labels: spark

Neo4j Etl

Data import from relational databases to Neo4j.

Stars: ✭ 165 (-32.65%)

Mutual labels: cypher

Airflow Pipeline

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Stars: ✭ 128 (-47.76%)

Mutual labels: spark

Neocons

A feature rich idiomatic Clojure client for the Neo4J REST API

Stars: ✭ 198 (-19.18%)

Mutual labels: cypher

Spring Boot Quick

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌

Stars: ✭ 1,819 (+642.45%)

Mutual labels: spark

Whylogs Java

Profile and monitor your ML data pipeline end-to-end

Stars: ✭ 164 (-33.06%)

Mutual labels: spark

Lift

The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.

Stars: ✭ 127 (-48.16%)

Mutual labels: spark

Recommendationsystem

Book recommender system using collaborative filtering based on Spark

Stars: ✭ 244 (-0.41%)

Mutual labels: spark

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-48.57%)

Mutual labels: spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+848.16%)

Mutual labels: spark

Scala Samples

There are pieces of scala code that explain Scala syntax and related things - like what you can do with all this

Stars: ✭ 125 (-48.98%)

Mutual labels: spark

Ballista

Distributed compute platform implemented in Rust, and powered by Apache Arrow.

Stars: ✭ 2,274 (+828.16%)

Mutual labels: spark

Spark Infotheoretic Feature Selection

This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (-49.8%)

Mutual labels: spark

Neo4j 3d Force Graph

Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph

Stars: ✭ 159 (-35.1%)

Mutual labels: cypher

Deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Stars: ✭ 2,020 (+724.49%)

Mutual labels: spark

Sagemaker Spark

A Spark library for Amazon SageMaker.

Stars: ✭ 219 (-10.61%)

Mutual labels: spark

Eat pyspark in 10 days

pyspark🍒🥭 is delicious，just eat it!😋😋

Stars: ✭ 116 (-52.65%)

Mutual labels: spark

Scalable Data Science Platform

Content for architecting a data science platform for products using Luigi, Spark & Flask.

Stars: ✭ 158 (-35.51%)

Mutual labels: spark

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (-51.02%)

Mutual labels: spark

Js Spark

Realtime calculation distributed system. AKA distributed lodash

Stars: ✭ 187 (-23.67%)

Mutual labels: spark

Elassandra

Elassandra = Elasticsearch + Apache Cassandra

Stars: ✭ 1,610 (+557.14%)

Mutual labels: spark

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-37.96%)

Mutual labels: spark

Cypher Dsl

A Java DSL for the Cypher Query Language

Stars: ✭ 116 (-52.65%)

Mutual labels: cypher

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (-4.49%)

Mutual labels: spark

Spark Lucenerdd

Spark RDD with Lucene's query and entity linkage capabilities

Stars: ✭ 114 (-53.47%)

Mutual labels: spark

Sparkmonitor

Monitor Apache Spark from Jupyter Notebook

Stars: ✭ 154 (-37.14%)

Mutual labels: spark

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-53.88%)

Mutual labels: spark

Kotlin Spark Api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

Stars: ✭ 183 (-25.31%)

Mutual labels: spark

Python Bigdata

Data science and Big Data with Python

Stars: ✭ 112 (-54.29%)

Mutual labels: spark

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (-37.55%)

Mutual labels: spark

Elephas

Distributed Deep learning with Keras & Spark

Stars: ✭ 1,521 (+520.82%)

Mutual labels: spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (-11.84%)

Mutual labels: spark

Spark Tsne

Distributed t-SNE via Apache Spark

Stars: ✭ 151 (-38.37%)

Mutual labels: spark

Dpark

Python clone of Spark, a MapReduce alike framework in Python

Stars: ✭ 2,668 (+988.98%)

Mutual labels: spark

Video Stream Analytics

Stars: ✭ 240 (-2.04%)

Mutual labels: spark

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (-5.71%)

Mutual labels: spark

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-12.24%)

Mutual labels: spark

Sparkstreaming

💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算)；🚀 支持运行过程中增删topic；🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。

Stars: ✭ 179 (-26.94%)

Mutual labels: spark

Aztk

AZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure

Stars: ✭ 152 (-37.96%)

Mutual labels: spark

61-120 of 455 similar projects

‹

›

next*5