Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.

Stars: ✭ 142 (+255%)

Mutual labels: apache-spark

isarn-sketches-spark

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

Stars: ✭ 28 (-30%)

Mutual labels: apache-spark

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+4202.5%)

Mutual labels: apache-spark

Pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

Stars: ✭ 231 (+477.5%)

Mutual labels: apache-spark

SimP-GCN

Implementation of the WSDM 2021 paper "Node Similarity Preserving Graph Convolutional Networks"

Stars: ✭ 43 (+7.5%)

Mutual labels: graph-mining

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+437.5%)

Mutual labels: apache-spark

vue-large-scale-folder-structure

Vue Js, 2 vue-cli large scale folder structure with vuex, vue-router, axios

Stars: ✭ 29 (-27.5%)

Mutual labels: large-scale

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+342.5%)

Mutual labels: apache-spark

Location-based-Restaurants-Recommendation-System

Big Data Management and Analysis Final Project

Stars: ✭ 44 (+10%)

Mutual labels: apache-spark

Cheatsheets.pdf

📚 Various cheatsheets in PDF

Stars: ✭ 159 (+297.5%)

Mutual labels: apache-spark

LabelPropagation

A NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).

Stars: ✭ 101 (+152.5%)

Mutual labels: community-detection

Oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

Stars: ✭ 1,785 (+4362.5%)

Mutual labels: apache-spark

fink-broker

Astronomy Broker based on Apache Spark

Stars: ✭ 18 (-55%)

Mutual labels: apache-spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (+242.5%)

Mutual labels: apache-spark

sparklygraphs

Old repo for R interface for GraphFrames

Stars: ✭ 13 (-67.5%)

Mutual labels: apache-spark

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (+202.5%)

Mutual labels: apache-spark

spark-connector

A connector for Apache Spark to access Exasol

Stars: ✭ 13 (-67.5%)

Mutual labels: apache-spark

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (+162.5%)

Mutual labels: apache-spark

Mastering Spark Sql Book

The Internals of Spark SQL

Stars: ✭ 234 (+485%)

Mutual labels: apache-spark

awesome-tools

curated list of awesome tools and libraries for specific domains

Stars: ✭ 31 (-22.5%)

Mutual labels: apache-spark

Awesome Ai Infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Stars: ✭ 223 (+457.5%)

Mutual labels: apache-spark

PLSC

Paddle Large Scale Classification Tools，supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.

Stars: ✭ 113 (+182.5%)

Mutual labels: large-scale

Quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

Stars: ✭ 217 (+442.5%)

Mutual labels: apache-spark

spark-twitter-sentiment-analysis

Sentiment Analysis of a Twitter Topic with Spark Structured Streaming

Stars: ✭ 55 (+37.5%)

Mutual labels: apache-spark

Learning Apache Spark

Notes on Apache Spark (pyspark)

Stars: ✭ 211 (+427.5%)

Mutual labels: apache-spark

Sparkora

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

Stars: ✭ 51 (+27.5%)

Mutual labels: apache-spark

Sparktorch

Train and run Pytorch models on Apache Spark.

Stars: ✭ 195 (+387.5%)

Mutual labels: apache-spark

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (+110%)

Mutual labels: apache-spark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+312.5%)

Mutual labels: apache-spark

geospark

bring sf to spark in production

Stars: ✭ 53 (+32.5%)

Mutual labels: apache-spark

Spark Atlas Connector

A Spark Atlas connector to track data lineage in Apache Atlas

Stars: ✭ 160 (+300%)

Mutual labels: apache-spark

Pro-GNN

Implementation of the KDD 2020 paper "Graph Structure Learning for Robust Graph Neural Networks"

Stars: ✭ 202 (+405%)

Mutual labels: graph-mining

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+275%)

Mutual labels: apache-spark

net.jgp.books.spark.ch07

Spark in Action, 2nd edition - chapter 7 - Ingestion from files

Stars: ✭ 13 (-67.5%)

Mutual labels: apache-spark

Parquetviewer

Simple windows desktop application for viewing & querying Apache Parquet files

Stars: ✭ 145 (+262.5%)

Mutual labels: apache-spark

libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Stars: ✭ 284 (+610%)

Mutual labels: large-scale

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (+260%)

Mutual labels: apache-spark

osm-parquetizer

A converter for the OSM PBFs to Parquet files

Stars: ✭ 71 (+77.5%)

Mutual labels: apache-spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+250%)

Mutual labels: apache-spark

EgoSplitting

A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).

Stars: ✭ 78 (+95%)

Mutual labels: community-detection

Spark Tpc Ds Performance Test

Use the TPC-DS benchmark to test Spark SQL performance

Stars: ✭ 133 (+232.5%)

Mutual labels: apache-spark

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-2.5%)

Mutual labels: apache-spark

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (+220%)

Mutual labels: apache-spark

spark3D

Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …

Stars: ✭ 23 (-42.5%)

Mutual labels: apache-spark

Spark On K8s Operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.