This package contains a generic implementation of greedy Information Theoretic Feature Selection (FS) methods. The implementation is based on the common theoretic framework presented by Gavin Brown. Implementations of mRMR, InfoGain, JMI and other commonly used FS filters are provided.

Stars: ✭ 123 (-18.54%)

Mutual labels: spark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+786.09%)

Mutual labels: spark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-0.66%)

Mutual labels: spark

Spark Summit 2017 Sanfrancisco

spark summit 2017 SanFrancisco

Stars: ✭ 93 (-38.41%)

Mutual labels: spark

Deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Stars: ✭ 2,020 (+1237.75%)

Mutual labels: spark

Spark On Kubernetes Helm

Spark on Kubernetes infrastructure Helm charts repo

Stars: ✭ 92 (-39.07%)

Mutual labels: spark

Apache Spark Node

Node.js bindings for Apache Spark DataFrame APIs

Stars: ✭ 136 (-9.93%)

Mutual labels: spark

Ammonite Spark

Run spark calculations from Ammonite

Stars: ✭ 88 (-41.72%)

Mutual labels: spark

Eat pyspark in 10 days

pyspark🍒🥭 is delicious，just eat it!😋😋

Stars: ✭ 116 (-23.18%)

Mutual labels: spark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (-42.38%)

Mutual labels: spark

Nd4j

Fast, Scientific and Numerical Computing for the JVM (NDArrays)

Stars: ✭ 1,742 (+1053.64%)

Mutual labels: spark

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-43.05%)

Mutual labels: spark

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (-20.53%)

Mutual labels: spark

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-44.37%)

Mutual labels: spark

Aliyun Emapreduce Datasources

Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.

Stars: ✭ 132 (-12.58%)

Mutual labels: spark

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-45.7%)

Mutual labels: spark

Elassandra

Elassandra = Elasticsearch + Apache Cassandra

Stars: ✭ 1,610 (+966.23%)

Mutual labels: spark

Mleap

MLeap: Deploy ML Pipelines to Production

Stars: ✭ 1,232 (+715.89%)

Mutual labels: spark

Benchm Ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (+1115.23%)

Mutual labels: spark

Spark Gbtlr

Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

Stars: ✭ 81 (-46.36%)

Mutual labels: spark

Cube.js

📊 Cube — Open-Source Analytics API for Building Data Apps

Stars: ✭ 11,983 (+7835.76%)

Mutual labels: spark

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-48.34%)

Mutual labels: spark

Abris

Avro SerDe for Apache Spark structured APIs.

Stars: ✭ 130 (-13.91%)

Mutual labels: spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-50.33%)

Mutual labels: spark

Spring Shiro Spark

Spring-Shiro-Spark是Spring-Boot Hibernate Spark Spark-SQL Shiro iView VueJs... ...的集成尝试

Stars: ✭ 114 (-24.5%)

Mutual labels: spark

Ds Cheatsheets

List of Data Science Cheatsheets to rule the world

Stars: ✭ 9,452 (+6159.6%)

Mutual labels: spark

Rasterframes

Geospatial Raster support for Spark DataFrames

Stars: ✭ 142 (-5.96%)

Mutual labels: spark

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-50.99%)

Mutual labels: spark

Xlearning Xdml

extremely distributed machine learning

Stars: ✭ 113 (-25.17%)

Mutual labels: spark

Spark Twitter Stream Example

"Sentiment analysis" on a live Twitter feed with Apache Spark and Apache Bahir

Stars: ✭ 73 (-51.66%)

Mutual labels: spark

Spylon Kernel

Jupyter kernel for scala and spark

Stars: ✭ 129 (-14.57%)

Mutual labels: spark

Kamu Cli

Next generation tool for decentralized exchange and transformation of semi-structured data

Stars: ✭ 69 (-54.3%)

Mutual labels: spark

Archivespark

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

Stars: ✭ 111 (-26.49%)

Mutual labels: spark

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-52.98%)

Mutual labels: spark

Pyspark Learning

Updated repository

Stars: ✭ 147 (-2.65%)

Mutual labels: spark

Fast Mrmr

An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).

Stars: ✭ 67 (-55.63%)

Mutual labels: spark

Lambda Arch

Applying Lambda Architecture with Spark, Kafka, and Cassandra.

Stars: ✭ 111 (-26.49%)

Mutual labels: spark

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+6870.86%)

Mutual labels: spark

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+987.42%)

Mutual labels: spark

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Stars: ✭ 65 (-56.95%)

Mutual labels: spark

Java learning practice

java 进阶之路：面试高频算法、akka、多线程、NIO、Netty、SpringBoot、Spark&&Flink 等

Stars: ✭ 110 (-27.15%)

Mutual labels: spark

Pyspark Twitter Stream Mining

Real-time Machine Learning with Apache Spark on Twitter Public Stream

Stars: ✭ 64 (-57.62%)

Mutual labels: spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-7.28%)

Mutual labels: spark

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-27.81%)

Mutual labels: spark

Spark Ml Source Analysis

spark ml 算法原理剖析以及具体的源码实现分析

Stars: ✭ 1,873 (+1140.4%)

Mutual labels: spark

Aztk

AZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure

Stars: ✭ 152 (+0.66%)

Mutual labels: spark

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (-2.65%)

Mutual labels: spark

Ecommercerecommendsystem

商品大数据实时推荐系统。前端：Vue + TypeScript + ElementUI，后端 Spring + Spark

Stars: ✭ 139 (-7.95%)

Mutual labels: spark

Spring Boot Quick

🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如：rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌

Stars: ✭ 1,819 (+1104.64%)

Mutual labels: spark

Hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

Stars: ✭ 108 (-28.48%)

Mutual labels: spark

61-120 of 399 similar projects

‹

›

next*5