All Projects → garystafford → Pyspark Setup Demo

garystafford / Pyspark Setup Demo

Licence: mit
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyspark Setup Demo

Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+3875%)
Mutual labels:  jupyter-notebook, jupyter, pyspark
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+354.17%)
Mutual labels:  jupyter-notebook, jupyter, big-data
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+279.17%)
Mutual labels:  jupyter-notebook, big-data, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+525%)
Mutual labels:  jupyter-notebook, big-data, pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+5475%)
Mutual labels:  jupyter-notebook, big-data, pyspark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+108.33%)
Mutual labels:  big-data, jupyter, pyspark
Sciblog support
Support content for my blog
Stars: ✭ 694 (+2791.67%)
Mutual labels:  jupyter-notebook, big-data
Cookbook 2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+2833.33%)
Mutual labels:  jupyter-notebook, jupyter
Nbstripout
strip output from Jupyter and IPython notebooks
Stars: ✭ 738 (+2975%)
Mutual labels:  jupyter-notebook, jupyter
Jupyter nbextensions configurator
A jupyter notebook serverextension providing config interfaces for nbextensions.
Stars: ✭ 814 (+3291.67%)
Mutual labels:  jupyter-notebook, jupyter
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+23466.67%)
Mutual labels:  jupyter-notebook, big-data
Spark Movie Lens
An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+3004.17%)
Mutual labels:  jupyter-notebook, big-data
Cocalc
CoCalc: Collaborative Calculation in the Cloud
Stars: ✭ 888 (+3600%)
Mutual labels:  jupyter-notebook, jupyter
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-4.17%)
Mutual labels:  jupyter-notebook, pyspark
Nteract
📘 The interactive computing suite for you! ✨
Stars: ✭ 5,713 (+23704.17%)
Mutual labels:  jupyter-notebook, jupyter
Elasticsearch Spark Recommender
Use Jupyter Notebooks to demonstrate how to build a Recommender with Apache Spark & Elasticsearch
Stars: ✭ 707 (+2845.83%)
Mutual labels:  jupyter-notebook, jupyter
Fastai2
Temporary home for fastai v2 while it's being developed
Stars: ✭ 630 (+2525%)
Mutual labels:  jupyter-notebook, jupyter
Jupyterlab Lsp
Coding assistance for JupyterLab (code navigation + hover suggestions + linters + autocompletion + rename) using Language Server Protocol
Stars: ✭ 796 (+3216.67%)
Mutual labels:  jupyter-notebook, jupyter
Spark Scala Tutorial
A free tutorial for Apache Spark.
Stars: ✭ 907 (+3679.17%)
Mutual labels:  jupyter-notebook, jupyter
Ansible Jupyter.dockerfile
Building the Docker image with Ansible and Jupyter.
Stars: ✭ 17 (-29.17%)
Mutual labels:  jupyter-notebook, jupyter

Jupyter Notebook PySpark Demo

Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks. Complete information for this project can be found by reading the related blog post, Getting Started with PySpark for Big Data Analytics, using Jupyter Notebooks and Docker

Architecture

Architecture

Set-up

  1. Clone this project from GitHub:

    git clone \
        --branch v2 --single-branch --depth 1 --no-tags \
        https://github.com/garystafford/pyspark-setup-demo.git
    
  2. Create $HOME/data/postgres directory for PostgreSQL files: mkdir -p ~/data/postgres

  3. Optional, for local development, install Python packages: python3 -m pip install -r requirements.txt

  4. Optional, pull docker images first:

    docker pull jupyter/all-spark-notebook:latest
    docker pull postgres:12-alpine
    docker pull adminer:latest
    
  5. Deploy Docker Stack: docker stack deploy -c stack.yml jupyter

  6. Retrieve the token to log into Jupyter: docker logs $(docker ps | grep jupyter_spark | awk '{print $NF}')

  7. From the Jupyter terminal, run the install script: sh bootstrap_jupyter.sh

  8. Export your Plotly username and api key to .env file:

    echo "PLOTLY_USERNAME=your-username" >> .env
    echo "PLOTLY_API_KEY=your-api-key" >> .env
    

Demo

From a Jupyter terminal window:

  1. Sample Python script, run python3 01_simple_script.py from Jupyter terminal
  2. Sample PySpark job, run $SPARK_HOME/bin/spark-submit 02_pyspark_job.py from Jupyter terminal
  3. Load PostgreSQL sample data, run python3 03_load_sql.py from Jupyter terminal
  4. Sample Jupyter Notebook, open 04_notebook.ipynb from Jupyter Console
  5. Sample Jupyter Notebook, open 05_notebook.ipynb from Jupyter Console
  6. Try the alternate Jupyter stack with nbextensions pre-installed, first cd docker_nbextensions/, then run docker build -t garystafford/all-spark-notebook-nbext:latest . to build the new image
  7. Then, to delete the previous stack, run docker stack rm jupyter, followed by creating the new stack, run cd - and docker stack deploy -c stack-nbext.yml jupyter
Jupyter Notebook

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].