Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → PacktPublishing → Learning Pyspark

PacktPublishing / Learning Pyspark

Licence: mit

Code repository for Learning PySpark by Packt

Labels

jupyter-notebook

Projects that are alternatives of or similar to Learning Pyspark

Pystacknet

Stars: ✭ 232 (-0.43%)

Mutual labels: jupyter-notebook

Smt

Surrogate Modeling Toolbox

Stars: ✭ 233 (+0%)

Mutual labels: jupyter-notebook

Jupyterwith

declarative and reproducible Jupyter environments - powered by Nix

Stars: ✭ 235 (+0.86%)

Mutual labels: jupyter-notebook

Awesome Pandas

A collection of resources for pandas (Python) and related subjects.

Stars: ✭ 232 (-0.43%)

Mutual labels: jupyter-notebook

🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

Stars: ✭ 5,720 (+2354.94%)

Mutual labels: jupyter-notebook

Whilemymcmcgentlysamples

my blog

Stars: ✭ 232 (-0.43%)

Mutual labels: jupyter-notebook

Wassdistance

Approximating Wasserstein distances with PyTorch

Stars: ✭ 229 (-1.72%)

Mutual labels: jupyter-notebook

Pydqc

python automatic data quality check toolkit

Stars: ✭ 233 (+0%)

Mutual labels: jupyter-notebook

Datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code

Stars: ✭ 231 (-0.86%)

Mutual labels: jupyter-notebook

Socceraction

Convert existing soccer event stream data to SPADL and value player actions

Stars: ✭ 234 (+0.43%)

Mutual labels: jupyter-notebook

Relevant Search Book

Code and Examples for Relevant Search

Stars: ✭ 231 (-0.86%)

Mutual labels: jupyter-notebook

Tensorflow 101

TensorFlow Tutorials

Stars: ✭ 2,565 (+1000.86%)

Mutual labels: jupyter-notebook

My tech resources

List of tech resources future me and other Javascript/Ruby/Python/Elixir/Elm developers might find useful

Stars: ✭ 233 (+0%)

Mutual labels: jupyter-notebook

Mattnet

MAttNet: Modular Attention Network for Referring Expression Comprehension

Stars: ✭ 232 (-0.43%)

Mutual labels: jupyter-notebook

Web scraping with python

Python 入门爬虫和数据分析实战

Stars: ✭ 234 (+0.43%)

Mutual labels: jupyter-notebook

Statannot

add statistical annotations (pvalue significance) on an existing boxplot generated by seaborn boxplot

Stars: ✭ 228 (-2.15%)

Mutual labels: jupyter-notebook

Pandas Highcharts

Beautiful charting of pandas.DataFrame with Highcharts

Stars: ✭ 233 (+0%)

Mutual labels: jupyter-notebook

Rl learn

我的强化学习笔记和学习材料📖 still updating ... ...

Stars: ✭ 234 (+0.43%)

Mutual labels: jupyter-notebook

Datavisualization

Tutorials on visualizing data using python packages like bokeh, plotly, seaborn and igraph

Stars: ✭ 234 (+0.43%)

Mutual labels: jupyter-notebook

Pyhessian

PyHessian is a Pytorch library for second-order based analysis and training of Neural Networks

Stars: ✭ 232 (-0.43%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Learning PySpark

This is the code repository for Learning PySpark, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.

About the book

Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark.

You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command.

By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications.

Instructions and Navigation

All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter 03.

The code will look like the following:

    data_key = sc.parallelize( 
         [('a', 4),('b', 3),('c', 2),('a', 8),('d', 2),('b', 1), 
         ('d', 3)],4) 
    data_key.reduceByKey(lambda x, y: x + y).collect()

Software requirements:

For this book you need a personal computer (can be either Windows machine, Mac, or Linux). To run Apache Spark, you will need Java 7+ and an installed and conﬁgured Python 2.6+ or 3.4+ environment; we use the Anaconda distribution of Python in version 3.5, which can be downloaded from https://www.continuum.io/downloads.

The Python modules we randomly use throughout the book come preinstalled with Anaconda. We also use GraphFrames and TensorFrames that can be loaded dynamically while starting a Spark instance: to load these you just need an Internet connection. It is fine if some of those modules are not currently installed on your machine – we will guide you through the installation process.

Note:

Chapter 11 and Bonus Chapter 02 does not contain code files.

Related Products:

Suggestions and Feedback

Click here if you have any feedback or suggestions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 233

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗