connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (+70%)
Analytics ZooDistributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Stars: ✭ 2,448 (+6020%)
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-50%)
AlbedoA recommender system for discovering GitHub repos, built with Apache Spark
Stars: ✭ 149 (+272.5%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-67.5%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (+310%)
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+225%)
Scalable Data ScienceScalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Stars: ✭ 142 (+255%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-30%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+4202.5%)
PysparklingA pure Python implementation of Apache Spark's RDD and DStream interfaces.
Stars: ✭ 231 (+477.5%)
SimP-GCNImplementation of the WSDM 2021 paper "Node Similarity Preserving Graph Convolutional Networks"
Stars: ✭ 43 (+7.5%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+437.5%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+342.5%)
LabelPropagationA NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).
Stars: ✭ 101 (+152.5%)
OryxOryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
Stars: ✭ 1,785 (+4362.5%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-55%)
sparklygraphsOld repo for R interface for GraphFrames
Stars: ✭ 13 (-67.5%)
spark-connectorA connector for Apache Spark to access Exasol
Stars: ✭ 13 (-67.5%)
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Stars: ✭ 105 (+162.5%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-22.5%)
PLSCPaddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, DeiT, FaceViT.
Stars: ✭ 113 (+182.5%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+442.5%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+27.5%)
SparktorchTrain and run Pytorch models on Apache Spark.
Stars: ✭ 195 (+387.5%)
geosparkbring sf to spark in production
Stars: ✭ 53 (+32.5%)
Pro-GNNImplementation of the KDD 2020 paper "Graph Structure Learning for Robust Graph Neural Networks"
Stars: ✭ 202 (+405%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+275%)
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
Stars: ✭ 145 (+262.5%)
libaiLiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (+610%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+260%)
osm-parquetizerA converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (+77.5%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+250%)
EgoSplittingA NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Stars: ✭ 78 (+95%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-2.5%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+220%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-42.5%)
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Stars: ✭ 1,780 (+4350%)
Docker SparkApache Spark docker image
Stars: ✭ 1,396 (+3390%)
M-NMFAn implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+197.5%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-52.5%)