TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+13793.33%)

Mutual labels: spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+1546.67%)

Mutual labels: spark

Spark Iforest

Isolation Forest on Spark

Stars: ✭ 166 (+1006.67%)

Mutual labels: spark

frovedis

Framework of vectorized and distributed data analytics

Stars: ✭ 59 (+293.33%)

Mutual labels: spark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+1000%)

Mutual labels: spark

Neo4j Spark Connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Stars: ✭ 245 (+1533.33%)

Mutual labels: spark

Whylogs Java

Profile and monitor your ML data pipeline end-to-end

Stars: ✭ 164 (+993.33%)

Mutual labels: spark

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (+6.67%)

Mutual labels: spark

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+15386.67%)

Mutual labels: spark

Recommendationsystem

Book recommender system using collaborative filtering based on Spark

Stars: ✭ 244 (+1526.67%)

Mutual labels: spark

Glow

An open-source toolkit for large-scale genomic analysis

Stars: ✭ 159 (+960%)

Mutual labels: spark

sentry-spark

Apache Spark Sentry Integration

Stars: ✭ 14 (-6.67%)

Mutual labels: spark

Handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+953.33%)

Mutual labels: spark

Hadoop Docker

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Stars: ✭ 238 (+1486.67%)

Mutual labels: spark

Learningapachespark

LearningApacheSpark

Stars: ✭ 155 (+933.33%)

Mutual labels: spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+233.33%)

Mutual labels: spark

Quill

Compile-time Language Integrated Queries for Scala

Stars: ✭ 1,998 (+13220%)

Mutual labels: spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+1460%)

Mutual labels: spark

Powderkeg

Live-coding the cluster!

Stars: ✭ 152 (+913.33%)

Mutual labels: spark

tpch-spark

TPC-H queries in Apache Spark SQL using native DataFrames API

Stars: ✭ 63 (+320%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+12386.67%)

Mutual labels: spark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+1413.33%)

Mutual labels: spark

Aztk

AZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure

Stars: ✭ 152 (+913.33%)

Mutual labels: spark

spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 286 (+1806.67%)

Mutual labels: spark

Cc Pyspark

Process Common Crawl data with Python and Spark

Stars: ✭ 147 (+880%)

Mutual labels: spark

Spark Workshop

Apache Spark™ and Scala Workshops

Stars: ✭ 224 (+1393.33%)

Mutual labels: spark

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (+880%)

Mutual labels: spark

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

Stars: ✭ 91 (+506.67%)

Mutual labels: spark

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+80806.67%)

Mutual labels: spark

Sagemaker Spark

A Spark library for Amazon SageMaker.

Stars: ✭ 219 (+1360%)

Mutual labels: spark

Spark Authorizer

A Spark SQL extension which provides SQL Standard Authorization for Apache Spark

Stars: ✭ 141 (+840%)

Mutual labels: spark

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (+133.33%)

Mutual labels: spark

Data science blogs

A repository to keep track of all the code that I end up writing for my blog posts.

Stars: ✭ 139 (+826.67%)

Mutual labels: spark

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+1340%)

Mutual labels: spark

Ecommercerecommendsystem

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Stars: ✭ 139 (+826.67%)

Mutual labels: spark

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (+140%)

Mutual labels: spark

Isolation Forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Stars: ✭ 139 (+826.67%)

Mutual labels: spark

Hydro Serving

MLOps Platform

Stars: ✭ 213 (+1320%)

Mutual labels: spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (+813.33%)

Mutual labels: spark

Spark-Ar

Resources for Spark AR

Stars: ✭ 43 (+186.67%)

Mutual labels: spark

Horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Stars: ✭ 11,943 (+79520%)

Mutual labels: spark

Spark Knn

k-Nearest Neighbors algorithm on Spark

Stars: ✭ 205 (+1266.67%)

Mutual labels: spark

Iot Traffic Monitor

Stars: ✭ 131 (+773.33%)

Mutual labels: spark

spark-word2vec

A parallel implementation of word2vec based on Spark

Stars: ✭ 24 (+60%)

Mutual labels: spark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+19226.67%)

Mutual labels: spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+640%)

Mutual labels: spark

incubator-linkis

Stars: ✭ 2,459 (+16293.33%)

Mutual labels: spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-13.33%)

Mutual labels: spark

spark-kubernetes

spark on kubernetes

Stars: ✭ 80 (+433.33%)

Mutual labels: spark

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+1106.67%)

Mutual labels: spark

Scanns

A scalable nearest neighbor search library in Apache Spark

Stars: ✭ 190 (+1166.67%)

Mutual labels: spark

61-120 of 397 similar projects

‹

›

next*5