Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → datamindedbe → Python And Spark For Data Analysis

datamindedbe / Python And Spark For Data Analysis

A four-day course on Python, the Scientific Python stack and PySpark, adapted from a training course given by Patrick Varilly to one of our clients in December 2015

Labels

jupyter-notebook

Projects that are alternatives of or similar to Python And Spark For Data Analysis

Machinelearningtutorial

Short Machine Learning Tutorial

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

3-hour tutorial on building deep learning models with Keras.

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Variational Autoencoders Summerschool 2016

Exercises for the semi-supervised summer school https://semisupervised-learning.compute.dtu.dk.

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Sphere Challenge

SPHERE Challenge: Activity Recognition with Multimodal Sensor Data

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

Algorithmic Trading Python

The repository for freeCodeCamp's YouTube course, Algorithmic Trading in Python

Stars: ✭ 846 (+8360%)

Mutual labels: jupyter-notebook

Tensorflow ResNet implementation on cifar10

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Novel Twitter Anomalies Pydatalondon2016

Detect novel anomalies on Twitter

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

Changepoint Detection

Online Change-point Detection Algorithm for Multi-Variate Data: Applications on Human/Robot Demonstrations.

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Awesome Ai Books

Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning

Stars: ✭ 855 (+8450%)

Mutual labels: jupyter-notebook

Meetupcityfinder

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

CS109 Final Project

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

Notes Lsju Machine Learning

机器学习笔记

Stars: ✭ 852 (+8420%)

Mutual labels: jupyter-notebook

Varios mini-projectos de ML

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

An Introduction to the SciPy Ecosystem presentation

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

Movielens Recommender

Course project for Programing Machine Learnings Applications class

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Headline analysis

Analyzing news headlines for fun and profit

Stars: ✭ 9 (-10%)

Mutual labels: jupyter-notebook

Notes and ideas for MARL group meetings

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Tensorflow Tutorial

Basics of Tensorflow

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Deeplearningcameraapp

Deep Learning Capstone Project. Live camera app that can interpret number strings in real-world images.

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

Competitive Data Science @ Tel Aviv Meetup

Stars: ✭ 10 (+0%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Python and Spark for Data Analysis

These are the IPython notebooks I used for a 4-day training course on Python and Spark for data science, given in December 2015 to a Data Minded client. The audience consisted of experienced data analysts, familiar with technologies like R and SPSS, but who had never used Python and had never worked on a Hadoop cluster.

The content is mildly redacted to remove all references to the actual client, but are otherwise unchanged.

Each day consisted of working through a series of IPython notebooks. Exercises are interspersed throughout. The last notebook of each day contains solutions to that day's exercises.

Objectives

The objectives of the training were to:

Learn the fundamentals of Python
Learn the fundamentals of its statistical and machine learning packages
Learn Apache Spark using Python
Learn how to apply these technologies in a live Hadoop cluster

Pre-requisites

Before the start of the course, we required the following software to be installed on students' laptops:

Anaconda 2.4.1 64-bit for Windows. The packages in this version of Anaconda included:
- Python 2.7.11
- IPython 4.0.1
- NumPy 1.9.3
- SciPy 0.16.0
- Matplotlib 1.5.0
- Pandas 0.17.1
- Seaborn 0.6.0
- Scikit-learn 0.17
Apache Spark 1.2.0. The version was chosen to match that in the client's production cluster, even though the latest version at the time of the course was 1.5.2
JDK 7u79.

Syllabus

The four days covered the following content.

Day 0: Fundamentals of Python

This day was intended for people with very limited programming experience and/or no Python experience. Day 0 was optional.

At the end of this day, the students were able to:

Start and run python programs interactively with python CLI
Use an IDE to write programs and execute them, including command line arguments
Create notebooks locally and on a server
Import libraries
Store data in variables and understand their reach
Know the standard operators
Control the flow of a program
Perform common string operations such as concatenation, substring, replace
Use the correct data structures
Use functions to structure your program

Day 1: Statistical and Machine Learning Packages

On Day 1, we discussed several of the powerful statistical and machine learning libraries in Python. It was purposely a very hands on introduction and we did not dive into the mathematics behind any of the algorithms.

At the end of this day, the students were able to:

Import and export data in csv
Use numpy/scipy to perform mathematical computations
Slice and dice data
Use pandas to wrangle data
Plot data and perform exploratory analysis
Use scikit-learn
Perform regression analysis in Python
Perform classification analysis in Python

Day 2: Apache Spark and Python

On the second day, we dove into Spark. We focused on the essential parts. After a brief introduction into Spark Core, we explored Spark SQL and Spark MLlib.

At the end of this day, the students were able to:

Understand the role of Spark and pyspark in the eco-system
Run spark locally from a shell
Run spark locally in IPython Notebooks
Do a word count on an input file
Load data in SparkSQL
Query data in SparkSQL
Use Spark MLlib to perform regression and classification analyses at scale

Day 3: Python and Apache Spark on a Cluster

In this last day, we set up a small Cloudera Hadoop cluster on AWS and explored how everything we had learned could be run in a cluster environment. The second half of the day was set aside for an open-ended project. Possible projects included:

setting up a machine learning pipeline on data from the UCI Machine Learning Repository;
implementing a machine learning algorithm using Spark Core;
testing to what extent Spark running times scales linearly with data size.

At the end of this day, the students were able to

Run python scripts on the cluster from a shell and from ipython notebooks
Use Spark to read from and write to HDFS
Use SparkSQL to read data from and write data to Hive
Understand how YARN works
Submit spark jobs on the cluster
Use Spark, SparkSQL and Spark MLlib to run algorithms on large-scale data.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 10

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗