datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+30%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+226.67%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+926.67%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+283.33%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+623.33%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+9563.33%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+66.67%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-53.33%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-6.67%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-23.33%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+400%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+160%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+11083.33%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+70%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+3436.67%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+270%)
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (-16.67%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-46.67%)
parquet-dotnet🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+563.33%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-20%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+1480%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (+3.33%)
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+1930%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (+126.67%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-3.33%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+56.67%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-56.67%)
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (+30%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+283.33%)
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-33.33%)
spark-operatorOperator for managing the Spark clusters on Kubernetes and OpenShift.
Stars: ✭ 129 (+330%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+6.67%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-16.67%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+123.33%)
spark-streaming-visualizeSimple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-46.67%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (+6.67%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-43.33%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-20%)
BigCLAM-ApacheSparkOverlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (+33.33%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+6.67%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-13.33%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-46.67%)