All Projects → mahmoudparsian → pyspark-algorithms

mahmoudparsian / pyspark-algorithms

Licence: other
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to pyspark-algorithms

data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-52.78%)
Mutual labels:  transformations, pyspark, monoid, mapreduce, data-abstractions
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+108.33%)
Mutual labels:  big-data, distributed-computing, pyspark, dataframe
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+111.11%)
Mutual labels:  big-data, distributed-computing, dataframe
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-52.78%)
Mutual labels:  big-data, pyspark, mapreduce
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+280.56%)
Mutual labels:  distributed-computing, pyspark, mapreduce
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+54.17%)
Mutual labels:  big-data, pyspark, dataframe
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+26.39%)
Mutual labels:  big-data, pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1758.33%)
Mutual labels:  big-data, pyspark
Asakusafw
Asakusa Framework
Stars: ✭ 114 (+58.33%)
Mutual labels:  big-data, mapreduce
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+1323.61%)
Mutual labels:  big-data, distributed-computing
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+197.22%)
Mutual labels:  big-data, nosql
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+200%)
Mutual labels:  big-data, pyspark
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (-8.33%)
Mutual labels:  big-data, nosql
Iotdb
Apache IoTDB
Stars: ✭ 1,221 (+1595.83%)
Mutual labels:  big-data, nosql
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+15165.28%)
Mutual labels:  big-data, mapreduce
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (-1.39%)
Mutual labels:  big-data, mapreduce
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+229.17%)
Mutual labels:  big-data, distributed-computing
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+226.39%)
Mutual labels:  big-data, dataframe
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+4127.78%)
Mutual labels:  big-data, dataframe
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-45.83%)
Mutual labels:  big-data, distributed-computing

Source Code for PySpark Algorithms Book

Unlock the Power of Big Data by PySpark Algorithms book

Buy PySpark Algorithms Book → PDF Version (.pdf)

Buy PySpark Algorithms Book → Kindle Version (.kpf)


PySpark Algorithms Book:

Author: Mahmoud Parsian ([email protected])

Purchase PySpark Algorithms Book from amazon.com

Publication date: August 2019


About PySpark Algorithms Book

  • This book is about PySpark (Python API for Spark)
  • Introductory book on how to solve data problems using PySpark
  • Learn how to use mappers, filters, and reducers
  • Learn how to partition data for fast queries
  • Learn how to use the mapPartitions() transformation
  • Learn how to use reduceByKey(), groupByKey(), and combineByKey() transformations
  • Learn how to use Spark's transformations and actions for solving real problems
  • Learn how to use RDDs and DataFrames
  • Learn how to read/write data from many data sources
  • Learn how to use Logistic regression
  • Learn how to use Spark's reduction transformations
  • Learn how to use GraphFrames
  • Learn how to use Motifs in GraphFrames
  • Learn how to use Monoids in MapReduce algorithms

PySpark Algorithms Book


Software


Table of Contents

chap01: Introduction to PySpark
chap02: Hello World
chap03: Data Abstractions
chap04: Getting Started -- Sample Chapter
chap05: Transformations in Spark
chap06: Reductions in Spark
chap07: DataFrames and SQL
chap08: Spark DataSources
chap09: Logistic Regression
chap10: Movie Recommendations
chap11: Graph Algorithms
chap12: Design Patterns and Monoids

Appendix A: How To Install Spark
Appendix B: How to Use Lambda Expressions
Appendix C: Questions And Answers (50+ QA)


Future chapters:

chap13: FP-Growth
chap14: LDA
chap15: Linear Regression


PySpark Algorithms Book

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].