All Projects → drabastomek → Learningpyspark

drabastomek / Learningpyspark

Licence: gpl-3.0
Code base for the Learning PySpark book (in preparation)

Projects that are alternatives of or similar to Learningpyspark

Imageprocessing Python
该资源为作者在CSDN的撰写Python图像处理文章的支撑,主要是Python实现图像处理、图像识别、图像分类等算法代码实现,希望该资源对您有所帮助,一起加油。
Stars: ✭ 483 (-3.21%)
Mutual labels:  jupyter-notebook
Bios8366
Advanced Statistical Computing at Vanderbilt University Medical Center's Department of Biostatistics
Stars: ✭ 490 (-1.8%)
Mutual labels:  jupyter-notebook
Tinderautomation
Stars: ✭ 495 (-0.8%)
Mutual labels:  jupyter-notebook
Dl4g
Example code for the Siggraph Asia Tutorial CreativeAI
Stars: ✭ 485 (-2.81%)
Mutual labels:  jupyter-notebook
Tiepvupsu.github.io
My Machine Learning blog
Stars: ✭ 490 (-1.8%)
Mutual labels:  jupyter-notebook
Tf Rnn
Practical Examples for RNNs in Tensorflow
Stars: ✭ 492 (-1.4%)
Mutual labels:  jupyter-notebook
Ml Mipt
Open Machine Learning course at MIPT
Stars: ✭ 480 (-3.81%)
Mutual labels:  jupyter-notebook
Deep Learning
A few notebooks about deep learning in pytorch
Stars: ✭ 496 (-0.6%)
Mutual labels:  jupyter-notebook
Tutorials
Code for some of my tutorials
Stars: ✭ 491 (-1.6%)
Mutual labels:  jupyter-notebook
Docproduct
Medical Q&A with Deep Language Models
Stars: ✭ 495 (-0.8%)
Mutual labels:  jupyter-notebook
Pythondatamining
📔 在学院的书架上发现了一本不带脑子就能看懂的书《Python数据挖掘与实战》
Stars: ✭ 489 (-2%)
Mutual labels:  jupyter-notebook
Stat453 Deep Learning Ss20
STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2020)
Stars: ✭ 489 (-2%)
Mutual labels:  jupyter-notebook
Or Pandas
【运筹OR帷幄|数据科学】pandas教程系列电子书
Stars: ✭ 492 (-1.4%)
Mutual labels:  jupyter-notebook
Airflow Tutorial
Apache Airflow tutorial
Stars: ✭ 485 (-2.81%)
Mutual labels:  jupyter-notebook
Self Driving Toy Car
A self driving toy car using end-to-end learning
Stars: ✭ 494 (-1%)
Mutual labels:  jupyter-notebook
Fpn
Feature Pyramid Networks for Object Detection
Stars: ✭ 485 (-2.81%)
Mutual labels:  jupyter-notebook
Team Learning Data Mining
主要存储Datawhale组队学习中“数据挖掘/机器学习”方向的资料。
Stars: ✭ 485 (-2.81%)
Mutual labels:  jupyter-notebook
Pangeo
Pangeo website + discussion of general issues related to the project.
Stars: ✭ 500 (+0.2%)
Mutual labels:  jupyter-notebook
Vl Bert
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
Stars: ✭ 493 (-1.2%)
Mutual labels:  jupyter-notebook
Yet Another Efficientdet Pytorch
The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
Stars: ✭ 4,945 (+890.98%)
Mutual labels:  jupyter-notebook

Learning PySpark

Code base for the Learning PySpark book by Tomasz Drabas and Denny Lee.

Book cover

Available from Packt and Amazon.

Introduction

It is estimated that in 2013 the whole world produced around 4.4 zettabytes of data; that is, 4.4 billion terabytes! By 2020, we (as a human race) are expected to produce ten times that. With data getting larger literally by the second there is a growing appetite for making sense out of it.

In this book, we will guide you through the latest incarnation of Apache Spark using Python. We will show you how to read structured and unstructured data, how to use some fundamental data types available in PySpark, how to build machine learning models, operate on graphs, read streaming data and deploy your models in the cloud. Each chapter will tackle different problem and by the end of the book we hope you will be knowledgeable enough to solve other problems we did not have space to cover here.

Table of contents:

  1. Understanding Spark
  2. Resilient Distributed Dataset
  3. DataFrames
  4. Preparing Data for Modeling
  5. Introducing MLlib
  6. Introducing the ML Package
  7. GraphFrames
  8. TensorFrames
  9. Polyglot Persistence with Blaze
  10. Structured Streaming
  11. Packaging Spark Applications

About authors

Tomasz Drabas is a Data Scientist working for Microsoft and currently residing in Seattle area. He has over 13 years of experience in data analytics and data science in numerous elds: advanced technology, airlines, telecommunications, nance and consulting he gained while working on three continents: Europe, Australia and North America. While in Australia, Tomasz has been working on his PhD in Operations Research with focus on choice modeling and revenue management applications in airline industry.

At Microsoft, Tomasz works with big data on a daily basis solving machine learning problems such as anomaly detection, churn prediction or pattern recognition using Spark.

Tomasz has also authored the Practical Data Analysis Cookbook published by Packt Publishing in 2016; you can purchase that book on Amazon, Packt and O’Reilly.

Denny Lee is a Principal Program Manager at Microsoft for the Azure DocumentDB team – Microsoft’s blazing fast, planet-scale managed document store service. He is a hands-on distributed systems and data sciences engineer with more than 18 years of experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments.

He has extensive experience in building green eld teams as well as turnaround / change catalyst. Prior to joining the Azure DocumentDB team, Denny worked as a Technology Evangelist at Databricks; he has been working with Apache Spark since 0.5. He was also the Senior Director of Data Sciences Engineering at Concur, and was on the incubation team that built Microsoft’s Hadoop on Windows and Azure service (currently known as HDInsight). Denny also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise Healthcare customers for the last fteen years.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].