❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+239.29%)

Mutual labels: spark

Validate-DCB

Validator for RDMA Configuration and Best Practices

Stars: ✭ 34 (+21.43%)

Mutual labels: rdma

shamash

Autoscaling for Google Cloud Dataproc

Stars: ✭ 31 (+10.71%)

Mutual labels: spark

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+546.43%)

Mutual labels: spark

Neo4j Spark Connector

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

Stars: ✭ 245 (+775%)

Mutual labels: spark

Hadoop Docker

基于Docker构建的Hadoop开发测试环境，包含Hadoop，Hive，HBase，Spark

Stars: ✭ 238 (+750%)

Mutual labels: spark

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (+14.29%)

Mutual labels: rdma

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+78.57%)

Mutual labels: spark

k8s-rdma-sriov-dev-plugin

Kubernetes Rdma SRIOV device plugin

Stars: ✭ 92 (+228.57%)

Mutual labels: rdma

spark-word2vec

A parallel implementation of word2vec based on Spark

Stars: ✭ 24 (-14.29%)

Mutual labels: spark

ksmbd

ksmbd kernel server(SMB/CIFS server)

Stars: ✭ 181 (+546.43%)

Mutual labels: rdma

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (-7.14%)

Mutual labels: spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+10771.43%)

Mutual labels: spark

sentry-spark

Apache Spark Sentry Integration

Stars: ✭ 14 (-50%)

Mutual labels: spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+782.14%)

Mutual labels: spark

experiments

Code examples for my blog posts

Stars: ✭ 21 (-25%)

Mutual labels: spark

Recommendationsystem

Book recommender system using collaborative filtering based on Spark

Stars: ✭ 244 (+771.43%)

Mutual labels: spark

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (+39.29%)

Mutual labels: spark

visualize-data-with-python

A Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.

Stars: ✭ 60 (+114.29%)

Mutual labels: spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+735.71%)

Mutual labels: spark

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (+710.71%)

Mutual labels: spark

Innofactor.Crm.CI

DevOps tools for Microsoft Dynamics 365

Stars: ✭ 23 (-17.86%)

Mutual labels: shuffle

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (-3.57%)

Mutual labels: spark

BigComputeLabs

Big Compute Learning Labs

Stars: ✭ 19 (-32.14%)

Mutual labels: rdma

spark-sql-flow-plugin

Visualize column-level data lineage in Spark SQL

Stars: ✭ 20 (-28.57%)

Mutual labels: spark

darpc

DaRPC: Data Center Remote Procedure Call

Stars: ✭ 49 (+75%)

Mutual labels: rdma

ksmbd

ksmbd kernel server(SMB/CIFS server)

Stars: ✭ 98 (+250%)

Mutual labels: rdma

pDPM

Passive Disaggregated Persistent Memory at USENIX ATC 2020.

Stars: ✭ 38 (+35.71%)

Mutual labels: rdma

docker-spark

Apache Spark docker container image (Standalone mode)

Stars: ✭ 34 (+21.43%)

Mutual labels: spark

ashuffle

Automatic library-wide shuffle for mpd.

Stars: ✭ 64 (+128.57%)

Mutual labels: shuffle

spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 286 (+921.43%)

Mutual labels: spark

Coyote

Framework providing operating system abstractions and a range of shared networking (RDMA, TCP/IP) and memory services to common modern heterogeneous platforms.

Stars: ✭ 80 (+185.71%)

Mutual labels: rdma

spark-kubernetes

spark on kubernetes

Stars: ✭ 80 (+185.71%)

Mutual labels: spark

Turbo-Transpose

Transpose: SIMD Integer+Floating Point Compression Filter

Stars: ✭ 50 (+78.57%)

Mutual labels: shuffle

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (+25%)

Mutual labels: spark

Spark Jobserver

REST job server for Apache Spark

Stars: ✭ 2,748 (+9714.29%)

Mutual labels: spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-53.57%)

Mutual labels: spark

Spark Fast Tests

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

Stars: ✭ 249 (+789.29%)

Mutual labels: spark

Spark-Ar

Resources for Spark AR

Stars: ✭ 43 (+53.57%)

Mutual labels: spark

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.