All Projects → Spark-for-data-engineers → Similar Projects or Alternatives

216 Open source projects that are alternatives of or similar to Spark-for-data-engineers

Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+886.36%)
Mutual labels:  apache-spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-36.36%)
Mutual labels:  apache-spark, pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+13077.27%)
Mutual labels:  apache-spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+404.55%)
Mutual labels:  apache-spark, pyspark
Sparkora
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+131.82%)
Mutual labels:  apache-spark, pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (+27.27%)
Mutual labels:  apache-spark, pyspark
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+281.82%)
Mutual labels:  apache-spark, pyspark
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+150%)
Mutual labels:  apache-spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+581.82%)
Mutual labels:  apache-spark, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+127.27%)
Mutual labels:  apache-spark, pyspark
pyspark-asyncactions
Asynchronous actions for PySpark
Stars: ✭ 30 (+36.36%)
Mutual labels:  apache-spark, pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (+4.55%)
Mutual labels:  apache-spark, pyspark
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+4722.73%)
Mutual labels:  apache-spark, pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+345.45%)
Mutual labels:  apache-spark, pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+1345.45%)
Mutual labels:  apache-spark, pyspark
jupyterlab-sparkmonitor
JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+254.55%)
Mutual labels:  apache-spark, pyspark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+15150%)
Mutual labels:  apache-spark, pyspark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+422.73%)
Mutual labels:  apache-spark, pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+1300%)
Mutual labels:  apache-spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+650%)
Mutual labels:  apache-spark, pyspark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+77.27%)
Mutual labels:  apache-spark, pyspark
flask-spark-docker
Just a boilerplate for PySpark and Flask
Stars: ✭ 32 (+45.45%)
Mutual labels:  pyspark
GCModeller
GCModeller: genomics CAD(Computer Assistant Design) Modeller system in .NET language
Stars: ✭ 25 (+13.64%)
Mutual labels:  r-language
OSCI
Open Source Contributor Index
Stars: ✭ 107 (+386.36%)
Mutual labels:  pyspark
geospark
bring sf to spark in production
Stars: ✭ 53 (+140.91%)
Mutual labels:  apache-spark
hyperdrive
Extensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (+40.91%)
Mutual labels:  apache-spark
r-exasol
The EXASOL package for R provides an interface to the EXASOL database.
Stars: ✭ 22 (+0%)
Mutual labels:  r-language
SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+490.91%)
Mutual labels:  apache-spark
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+250%)
Mutual labels:  pyspark
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (-13.64%)
Mutual labels:  pyspark
introducao-analise-de-dados
Minicurso de introdução à análise de dados
Stars: ✭ 20 (-9.09%)
Mutual labels:  r-language
bayesopt-tutorial-r
Tutorial on Bayesian optimization in R
Stars: ✭ 15 (-31.82%)
Mutual labels:  r-language
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+227.27%)
Mutual labels:  pyspark
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (+13.64%)
Mutual labels:  pyspark
kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Stars: ✭ 38 (+72.73%)
Mutual labels:  pyspark
osm-parquetizer
A converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (+222.73%)
Mutual labels:  apache-spark
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+204.55%)
Mutual labels:  apache-spark
sparklygraphs
Old repo for R interface for GraphFrames
Stars: ✭ 13 (-40.91%)
Mutual labels:  apache-spark
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+563.64%)
Mutual labels:  apache-spark
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (+9.09%)
Mutual labels:  pyspark
proxima-platform
The Proxima platform.
Stars: ✭ 17 (-22.73%)
Mutual labels:  apache-spark
BigCLAM-ApacheSpark
Overlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (+81.82%)
Mutual labels:  apache-spark
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (+40.91%)
Mutual labels:  apache-spark
streamsx.kafka
Repository for integration with Apache Kafka
Stars: ✭ 13 (-40.91%)
Mutual labels:  apache-spark
spark
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+2668.18%)
Mutual labels:  apache-spark
parquet-dotnet
🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+804.55%)
Mutual labels:  apache-spark
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+422.73%)
Mutual labels:  pyspark
Kaggle
Kaggle Kernels (Python, R, Jupyter Notebooks)
Stars: ✭ 26 (+18.18%)
Mutual labels:  r-language
microarray-analysis
Materials on the analysis of microarray expression data; focus on re-analysis of public data ( http://tinyurl.com/cruk-microarray)
Stars: ✭ 44 (+100%)
Mutual labels:  r-language
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (+45.45%)
Mutual labels:  pyspark
fuseml
FuseML aims to provide an MLOps framework as the medium dynamically integrating together the AI/ML tools of your choice. It's an extensible tool built through collaboration, where Data Engineers and DevOps Engineers can come together and contribute with reusable integration code.
Stars: ✭ 73 (+231.82%)
Mutual labels:  data-engineers
Location-based-Restaurants-Recommendation-System
Big Data Management and Analysis Final Project
Stars: ✭ 44 (+100%)
Mutual labels:  apache-spark
pyspark-k8s-boilerplate
Boilerplate for PySpark on Cloud Kubernetes
Stars: ✭ 24 (+9.09%)
Mutual labels:  pyspark
net.jgp.books.spark.ch07
Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stars: ✭ 13 (-40.91%)
Mutual labels:  apache-spark
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+163.64%)
Mutual labels:  pyspark
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-9.09%)
Mutual labels:  apache-spark
pyspark-for-data-processing
Code for my presentation: Using PySpark to Process Boat Loads of Data
Stars: ✭ 20 (-9.09%)
Mutual labels:  pyspark
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (-18.18%)
Mutual labels:  apache-spark
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (+31.82%)
Mutual labels:  apache-spark
Useless R functions
Useless R Functions. That's it
Stars: ✭ 77 (+250%)
Mutual labels:  r-language
1-60 of 216 similar projects