All Projects → pydata → Parallel Tutorial

pydata / Parallel Tutorial

Parallel computing in Python tutorial materials

Projects that are alternatives of or similar to Parallel Tutorial

Healthcare
Stars: ✭ 265 (-1.12%)
Mutual labels:  jupyter-notebook
Movielens
4 different recommendation engines for the MovieLens dataset.
Stars: ✭ 265 (-1.12%)
Mutual labels:  jupyter-notebook
Lstm pose machines
Code repo for "LSTM Pose Machines" (CVPR'18)
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook
Dlpython course
Примеры для курса "Программирование глубоких нейронных сетей на Python"
Stars: ✭ 266 (-0.75%)
Mutual labels:  jupyter-notebook
Ransac Flow
(ECCV 2020) RANSAC-Flow: generic two-stage image alignment
Stars: ✭ 265 (-1.12%)
Mutual labels:  jupyter-notebook
Quantum
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook
Plantwateringalarm
A soil humidity level sensor based on ATTINY44. Uses capacitive sensing.
Stars: ✭ 264 (-1.49%)
Mutual labels:  jupyter-notebook
Tutorial
A tutorial for widgets
Stars: ✭ 267 (-0.37%)
Mutual labels:  jupyter-notebook
Embeddedsystem
📚 嵌入式系统基础知识与主流编程语言相关内容总结
Stars: ✭ 266 (-0.75%)
Mutual labels:  jupyter-notebook
Decagon
Graph convolutional neural network for multirelational link prediction
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow basic tutorial
Stars: ✭ 265 (-1.12%)
Mutual labels:  jupyter-notebook
Deep Learning Keras Tensorflow
Introduction to Deep Neural Networks with Keras and Tensorflow
Stars: ✭ 2,868 (+970.15%)
Mutual labels:  jupyter-notebook
Advhat
AdvHat: Real-world adversarial attack on ArcFace Face ID system
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook
Oreilly Rl Tutorial
Contains Jupyter notebooks associated with the "Deep Reinforcement Learning Tutorial" tutorial given at the O'Reilly 2017 NYC AI Conference.
Stars: ✭ 266 (-0.75%)
Mutual labels:  jupyter-notebook
Cookiecutter Docker Science
Cookiecutter template for data scientists working with Docker containers
Stars: ✭ 267 (-0.37%)
Mutual labels:  jupyter-notebook
Torrent To Google Drive Downloader
Simple notebook to stream torrent files to Google Drive using Google Colab.
Stars: ✭ 266 (-0.75%)
Mutual labels:  jupyter-notebook
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (-1.12%)
Mutual labels:  jupyter-notebook
Pytorch tiramisu
FC-DenseNet in PyTorch for Semantic Segmentation
Stars: ✭ 267 (-0.37%)
Mutual labels:  jupyter-notebook
Deeplearning.ai Assignments
Stars: ✭ 268 (+0%)
Mutual labels:  jupyter-notebook
Introduction To Machine Learning
This repo will house all our course material and code snippets from the Introduction to Machine Learning Class
Stars: ✭ 267 (-0.37%)
Mutual labels:  jupyter-notebook

Parallel Python: Analyzing Large Datasets

Join the chat at https://gitter.im/pydata/parallel-tutorial

Student Goals

Students will walk away with a high-level understanding of both parallel problems and how to reason about parallel computing frameworks. They will also walk away with hands-on experience using a variety of frameworks easily accessible from Python.

Student Level

Knowledge of Python and general familiarity with the Jupyter notebook are assumed. This is generally aimed at a beginning to intermediate audience.

Outline

For the first half we cover basic ideas and common patterns in parallel computing, including embarrassingly parallel map, unstructured asynchronous submit, and large collections.

For the second half we cover complications arising from distributed memory computing and exercise the lessons learned in the first section by running informative examples on provided clusters.

  • Part one
    • Parallel Map
    • Asynchronous Futures
    • High Level Datasets
  • Part two
    • Scaling cross validation parameter search
    • Tabular data with map/submit
    • Tabular data with dataframes

Installation

  1. Download this repository:

    git clone https://github.com/pydata/parallel-tutorial
    

    or download as a zip file.

  2. Install Anaconda (large) or Miniconda (small)

  3. Create a new conda environment:

     conda env create -f environment.yml
     source activate parallel  # Linux OS/X
     activate parallel         # Windows
    
  4. If you want to use Spark (this is a large download):

     conda install -c conda-forge pyspark
    

Test your installation:

python -c "import concurrent.futures, dask, jupyter"

Dataset Preparation

We will generate a dataset for use locally. This will take up about 1GB of space in a new local directory, data/.

python prep.py

Part 1: Local Notebooks

Part one of this tutorial takes place on your laptop, using multiple cores. Run Jupyter Notebook locally and navigate to the notebooks/ directory.

jupyter notebook

The notebooks are ordered 1, 2, 3, so you can start with 01-map.ipynb

Part 2: Remote Clusters

Part two of this tutorial takes place on a remote cluster.

Visit the following page to start an eight-node cluster: https://pydata-parallel.jovyan.org/

If at any point your cluster fails you can always start a new one by re-visiting this page.

Warning: your cluster will be deleted when you close out. If you want to save your work you will need to Download your notebooks explicitly.

Slides

Brief, high level slides exist at http://pydata.github.io/parallel-tutorial/.

Sponsored Cloud Provider

We thank Google for generously providing compute credits on Google Compute Engine.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].