All Projects → mahmoudparsian → data-algorithms-with-spark

mahmoudparsian / data-algorithms-with-spark

Licence: other
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Programming Languages

python
139335 projects - #7 most used programming language
scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to data-algorithms-with-spark

pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+111.76%)
Mutual labels:  transformations, pyspark, monoid, mapreduce, data-abstractions
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-50%)
Mutual labels:  pyspark, spark-ml, data-algorithms
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+7305.88%)
Mutual labels:  spark, pyspark, spark-ml
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-26.47%)
Mutual labels:  spark, pyspark
data sciences campaign
【数据科学家系列课程】
Stars: ✭ 91 (+167.65%)
Mutual labels:  machine-learning-algorithms, design-patterns
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-5.88%)
Mutual labels:  machine-learning-algorithms, pyspark
dlsa
Distributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-26.47%)
Mutual labels:  pyspark, spark-ml
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+0%)
Mutual labels:  pyspark, mapreduce
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-23.53%)
Mutual labels:  spark, pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+47.06%)
Mutual labels:  spark, pyspark
ai-deployment
关注AI模型上线、模型部署
Stars: ✭ 149 (+338.24%)
Mutual labels:  pyspark, spark-ml
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-5.88%)
Mutual labels:  spark, pyspark
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-17.65%)
Mutual labels:  pyspark, spark-ml
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3873.53%)
Mutual labels:  data-transformation, pyspark
System Design Primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Stars: ✭ 154,659 (+454779.41%)
Mutual labels:  design, design-patterns
Awesome Design Systems
A curated list of bookmarks, resources and articles about design systems focused on developers.
Stars: ✭ 222 (+552.94%)
Mutual labels:  design, design-patterns
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+7132.35%)
Mutual labels:  spark, pyspark
Tachyons
Functional css for humans
Stars: ✭ 11,057 (+32420.59%)
Mutual labels:  design, design-patterns
Design Patterns
Modern view on classic design patterns implementation in Java
Stars: ✭ 157 (+361.76%)
Mutual labels:  design, design-patterns
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+179.41%)
Mutual labels:  spark, spark-ml

O'Reilly book: Data Algorithms with Spark

Goal of this book: enable writing efficient & simpler code for data algorithms using Spark



Github Chapter Solutions


Software:

Spark Python Scala Java
Apache Spark 3.2.0 Python 3.7.2 Scala 2.13 Java 8

Table of Contents

Chapter Title
Bonus Chapters Bonus Chapters (TF-IDF, Correlation, K-mers, anagrams, ...)
Chapter 1 Introduction to Data Algorithms
Chapter 2 Transformations in Action
Chapter 3 Mapper Transformations
Chapter 4 Reductions in Spark
Chapter 5 Partitioning Data
Chapter 6 Graph Algorithms
Chapter 7 Interacting with External Data Sources
Chapter 8 Ranking Algorithms
Chapter 9 Fundamental Data Design Patterns
Chapter 10 Common Data Design Patterns
Chapter 11 Join Design Patterns
Chapter 12 Feature Engineering in PySpark

Data Algorithms with Spark
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].