Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+886.36%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-36.36%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+13077.27%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+404.55%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+131.82%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+27.27%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+581.82%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+127.27%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (+4.55%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4722.73%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+345.45%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+254.55%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+15150%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+422.73%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+1300%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+77.27%)
GCModellerGCModeller: genomics CAD(Computer Assistant Design) Modeller system in .NET language
Stars: ✭ 25 (+13.64%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (+386.36%)
geosparkbring sf to spark in production
Stars: ✭ 53 (+140.91%)
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (+40.91%)
r-exasolThe EXASOL package for R provides an interface to the EXASOL database.
Stars: ✭ 22 (+0%)
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+490.91%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+250%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+227.27%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (+13.64%)
osm-parquetizerA converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (+222.73%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+204.55%)
sparklygraphsOld repo for R interface for GraphFrames
Stars: ✭ 13 (-40.91%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+563.64%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (+9.09%)
BigCLAM-ApacheSparkOverlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (+81.82%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (+40.91%)
streamsx.kafkaRepository for integration with Apache Kafka
Stars: ✭ 13 (-40.91%)
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+2668.18%)
parquet-dotnet🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+804.55%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+422.73%)
KaggleKaggle Kernels (Python, R, Jupyter Notebooks)
Stars: ✭ 26 (+18.18%)
microarray-analysisMaterials on the analysis of microarray expression data; focus on re-analysis of public data ( http://tinyurl.com/cruk-microarray)
Stars: ✭ 44 (+100%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+45.45%)
fusemlFuseML aims to provide an MLOps framework as the medium dynamically integrating together the AI/ML tools of your choice. It's an extensible tool built through collaboration, where Data Engineers and DevOps Engineers can come together and contribute with reusable integration code.
Stars: ✭ 73 (+231.82%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+163.64%)
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-9.09%)
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-18.18%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (+31.82%)