A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+10094.44%)

Mutual labels: spark

avro-schema-generator

Library for generating avro schema files (.avsc) based on DB tables structure

Stars: ✭ 38 (+111.11%)

Mutual labels: avro

Mlfeature

Feature engineering toolkit for Spark MLlib.

Stars: ✭ 12 (-33.33%)

Mutual labels: spark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+733.33%)

Mutual labels: spark

Sparkjni

A heterogeneous Apache Spark framework.

Stars: ✭ 11 (-38.89%)

Mutual labels: spark

avrow

Avrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.

Stars: ✭ 27 (+50%)

Mutual labels: avro

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+4605.56%)

Mutual labels: spark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+716.67%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+427.78%)

Mutual labels: spark

Mleap

MLeap: Deploy ML Pipelines to Production

Stars: ✭ 1,232 (+6744.44%)

Mutual labels: spark

Chronicler

Scala toolchain for InfluxDB

Stars: ✭ 24 (+33.33%)

Mutual labels: spark

Spark Cassandra Connector

DataStax Spark Cassandra Connector

Stars: ✭ 1,816 (+9988.89%)

Mutual labels: spark

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

Stars: ✭ 23 (+27.78%)

Mutual labels: spark

parquet-extra

A collection of Apache Parquet add-on modules

Stars: ✭ 30 (+66.67%)

Mutual labels: avro

Spark Scala Tutorial

A free tutorial for Apache Spark.

Stars: ✭ 907 (+4938.89%)

Mutual labels: spark

Nd4j

Fast, Scientific and Numerical Computing for the JVM (NDArrays)

Stars: ✭ 1,742 (+9577.78%)

Mutual labels: spark

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-11.11%)

Mutual labels: spark

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (+100%)

Mutual labels: spark

Sparkling Water

Sparkling Water provides H2O functionality inside Spark cluster

Stars: ✭ 887 (+4827.78%)

Mutual labels: spark

Rasterframes

Geospatial Raster support for Spark DataFrames

Stars: ✭ 142 (+688.89%)

Mutual labels: spark

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+4438.89%)

Mutual labels: spark

Scanns

A scalable nearest neighbor search library in Apache Spark

Stars: ✭ 190 (+955.56%)

Mutual labels: spark

Lehar

Visualize data using relative ordering

Stars: ✭ 81 (+350%)

Mutual labels: spark

Spark Redis

A connector for Spark that allows reading and writing to/from Redis cluster

Stars: ✭ 773 (+4194.44%)

Mutual labels: spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+677.78%)

Mutual labels: spark

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning

Stars: ✭ 6,458 (+35777.78%)

Mutual labels: spark

Insulator

A client UI to inspect Kafka topics, consume, produce and much more

Stars: ✭ 53 (+194.44%)

Mutual labels: avro

Spark Movie Lens

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Stars: ✭ 745 (+4038.89%)

Mutual labels: spark

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (+672.22%)

Mutual labels: spark

ksql-jdbc-driver

JDBC driver for Apache Kafka

Stars: ✭ 85 (+372.22%)

Mutual labels: confluent

Frameless

Expressive types for Spark.

Stars: ✭ 717 (+3883.33%)

Mutual labels: spark

Quicksql

A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources

Stars: ✭ 1,821 (+10016.67%)

Mutual labels: spark

Elasticsearch Spark Recommender

Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch

Stars: ✭ 707 (+3827.78%)

Mutual labels: spark

smolder

HL7 Apache Spark Datasource

Stars: ✭ 33 (+83.33%)

Mutual labels: spark

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+3483.33%)

Mutual labels: spark

Apache Spark Node

Node.js bindings for Apache Spark DataFrame APIs

Stars: ✭ 136 (+655.56%)

Mutual labels: spark

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+3383.33%)

Mutual labels: spark

parquet-flinktacular

How to use Parquet in Flink

Stars: ✭ 29 (+61.11%)

Mutual labels: avro

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+31322.22%)

Mutual labels: spark

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (+633.33%)

Mutual labels: spark

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+30527.78%)

Mutual labels: spark

kafka-connect-iot-mqtt-connector-example

Internet of Things Integration Example => Apache Kafka + Kafka Connect + MQTT Connector + Sensor Data