All Projects → cuttlefishh → Python For Data Analysis

cuttlefishh / Python For Data Analysis

Licence: mit
An introduction to data science using Python and Pandas with Jupyter notebooks

Projects that are alternatives of or similar to Python For Data Analysis

Tf Dann
Domain-Adversarial Neural Network in Tensorflow
Stars: ✭ 556 (-1.42%)
Mutual labels:  jupyter-notebook
Datasciencecourse
This holds iPython notebooks and lecture slides for the Intro to Data Science Master's course I teach at NYU.
Stars: ✭ 557 (-1.24%)
Mutual labels:  jupyter-notebook
Hands On Machine Learning For Algorithmic Trading
Hands-On Machine Learning for Algorithmic Trading, published by Packt
Stars: ✭ 562 (-0.35%)
Mutual labels:  jupyter-notebook
Gs Quant
Python toolkit for quantitative finance
Stars: ✭ 556 (-1.42%)
Mutual labels:  jupyter-notebook
Influence Release
Stars: ✭ 559 (-0.89%)
Mutual labels:  jupyter-notebook
Wenzheng
ai challenger 2018细粒度情感分类第一名解决方案, A training framework itegrating tensorflow and pytorch
Stars: ✭ 561 (-0.53%)
Mutual labels:  jupyter-notebook
Torch Residual Networks
This is a Torch implementation of ["Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun](http://arxiv.org/abs/1512.03385) the winners of the 2015 ILSVRC and COCO challenges.
Stars: ✭ 553 (-1.95%)
Mutual labels:  jupyter-notebook
Ttur
Two time-scale update rule for training GANs
Stars: ✭ 567 (+0.53%)
Mutual labels:  jupyter-notebook
Mellotron
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
Stars: ✭ 560 (-0.71%)
Mutual labels:  jupyter-notebook
Fastcore
Python supercharged for the fastai library
Stars: ✭ 565 (+0.18%)
Mutual labels:  jupyter-notebook
Qs ledger
Quantified Self Personal Data Aggregator and Data Analysis
Stars: ✭ 559 (-0.89%)
Mutual labels:  jupyter-notebook
Data Analysis And Machine Learning Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Stars: ✭ 5,166 (+815.96%)
Mutual labels:  jupyter-notebook
Practical seq2seq
A simple, minimal wrapper for tensorflow's seq2seq module, for experimenting with datasets rapidly
Stars: ✭ 563 (-0.18%)
Mutual labels:  jupyter-notebook
Log Progress
https://habr.com/ru/post/276725/
Stars: ✭ 556 (-1.42%)
Mutual labels:  jupyter-notebook
Detectorch
Detectorch - detectron for PyTorch
Stars: ✭ 566 (+0.35%)
Mutual labels:  jupyter-notebook
Numerical Tours
Numerical Tours of Signal Processing
Stars: ✭ 553 (-1.95%)
Mutual labels:  jupyter-notebook
Prml
Repository of notes, code and notebooks for the book Pattern Recognition and Machine Learning by Christopher Bishop
Stars: ✭ 560 (-0.71%)
Mutual labels:  jupyter-notebook
Datasets For Recommender Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Stars: ✭ 564 (+0%)
Mutual labels:  jupyter-notebook
Machine Learning Specialization
Stars: ✭ 566 (+0.35%)
Mutual labels:  jupyter-notebook
Pyradiomics
Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
Stars: ✭ 563 (-0.18%)
Mutual labels:  jupyter-notebook

Python for Data Analysis

Course in data science. Learn to analyze data of all types using the Python programming language. No programming experience is necessary.

Quick links: 📁 lessons ⏬ Lesson Schedule

Software covered:

  • Python 3
  • IPython environment and Jupyter notebooks
  • Conda for package management and virtual environments

Course topics include:

  • The UNIX command line
  • Fundamentals of Python and its data types
  • Data analysis packages Numpy and Pandas
  • Plotting packages Matplotlib and Seaborn
  • Statistics
  • Regular expressions
  • Interactive visualization
  • Modules and classes
  • Git and GitHub

Instructor

Online Content

Textbooks

  • Learn Python 3 the Hard Way by Zed Shaw (Addison-Wesley) -- Step-by-step introduction to Python with no prior knowledge assumed; includes appendix Command Line Crash Course.
  • Learning Python 3rd Edition by Mark Lutz (O'Reilly) -- Optional; more traditional introduction to Python as a computer language.
  • Python for Data Analysis 2nd Edition by Wes McKinney (O'Reilly) -- Manual focused on Pandas, the popular Python package for data analysis, by its creator. GitHub page: https://github.com/wesm/pydata-book.

O'Reilly Media titles are free to UCSD affiliates with Safari Books Online.

Additional Materials

Command Line Resources

Python Resources

IPython Resources from Cyrille Rossant

Data Analysis Resources

Course Philosophy

  1. You learn Python by doing, just like anything else. With a few exceptions, you're not going to break your computer by trying new commands. So just try it and see what happens. Print output of commands. Print values of variables. Kick the thing until it works.
  2. Resist the urge to get frustrated and blame the computer when your code doesn't run. Computers are deterministic machines; it's almost always your fault. But that's OK! Your computer will give you error messages that describe what went wrong. Read them and try to understand them.
  3. When you don't know how to do something, google it. You'll be amazed by the solutions you'll find to do thing x if you google "python thing x".
  4. Learn keyboard shortcuts, as many as you can. Tab-complete in the shell and IPython/Jupyter!
  5. Remember Zed's sage wisdom:
    • Practice every day.
    • Don't over-do it. Slow and steady wins the race.
    • It's alright to be totally lost at first.
    • When you get stuck, get more information.
    • Try to solve it yourself first.

Assignments and Grading

Weekly Assignments

Weekly take-home assignments will follow the course schedule, reinforcing skills with exercises to analyze and visualize scientific data. Assignments will given out on Wednesdays and will be due the following Wednesday, using TritonEd. Assignments are worth 8 points each and will be graded on effort, completeness, and accuracy.

Final Project

You will choose a dataset of your own or provided in one of the texts and write a Python program (or set of Python programs or mixture of .ipynb and .py/.sh scripts) to carry out a revealing data analysis or create a software tool. Have a look at Shaw Ex43-52 and McKinney Ch10-12 for more ideas. The final project is worth 20 points and will be graded on effort, creativity, and fulfillment of the requirements below.

Requirements:

  • Submit your project as either: a Jupyter notebook (or collection of notebooks), a Python script (or collection of scripts), or a combination of the two.
  • Use pandas and one or more package from at least three (≥3) of the categories below:
    • Plotting: matplotlib, seaborn
    • Interactive visualization: bokeh, pygal, plotly, mpld3, nvd3
    • Statistics and modeling: scipy, statsmodels, scikit-learn
    • Bioinformatics: scikit-bio, biopython
    • Climate science: cdms, iris
    • Any other domain-specific library/package
  • Use at least three (≥3) user-defined functions.
  • Optional: Create user-defined modules and classes for use in your code.
  • Optional: Share your code on GitHub.

Grading

There are 100 points total possible for the course:

  • Assignments: 72 points (9 assignments x 8 points each)
  • Final project: 20 points
  • Participation: 8 points

Participation is based on completing the pre-course survey, showing up to class (when you are able), and completing the course evaluation (this is on the honor system as I won't know who completes it). There are no midterm or final exams.

Schedule Overview

The course consists of 20 lessons. As a class, it is taught as two lessons per week for 10 weeks, but the material can be covered at any pace.

Lessons 1-3 will be an introduction to the command line. By the end of this tutorial, everyone will be familiar with basic Unix commands.

Lessons 4-9 will be an introduction to programming using Python. The main text will be Shaw's Learn Python 3 the Hard Way. For those with experience in a programming language other than Python, Lutz's Learning Python will provide a more thorough introduction to programming Python. We will learn to use IPython and IPython Notebooks (also called Jupyter Notebooks), a much richer Python experience than the Unix command line or Python interpreter.

Lessons 10-18 will focus on Python packages for data analysis. We will work through McKinney's Python for Data Analysis, which is all about analyzing data, doing statistics, and making pretty plots. You may find that Python can emulate or exceed much of the functionality of R and MATLAB.

Lessons 19-20 conclude the course with two skills useful in developing code: writing your own classes and modules, and sharing your code on GitHub.

Lesson Schedule

Lessons are available as .md or .ipynb files by clicking on the lesson numbers below. Readings should be completed while typing out the code (this is integral to the Shaw readings) and doing any Study Drills (Shaw) and Chapter Quizzes (Lutz).

Lesson Title Readings Topics Assignment
1 Overview -- Introductions and overview of course Pre-course survey; Acquire texts
2 Command Line Part I Shaw: Introduction,
Ex0, Appendix A
Command line crash course; Text editors Assignment 1: Basic Shell Commands
3 Command Line Part II Yale: The 10 Most Important Linux Commands Advanced commands in the bash shell --
4 Conda, IPython, and Jupyter Notebooks Geohackweek: Introduction to Conda Conda tutorial including Conda environments, Python packages, and PIP; Python and IPython in the command line; Jupyter notebook tutorial; Python crash course Assignment 2: Bash, Conda, IPython, and Jupyter
5 Python Basics, Strings, Printing Shaw: Ex1-10; Lutz: Ch1-7 Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython --
6 Taking Input, Reading and Writing Files, Functions Shaw: Ex11-26; Lutz: Ch9,14-17 Taking input, reading files, writing files, functions Assignment 3: Python Fundamentals I
7 Logic, Loops, Lists, Dictionaries, and Tuples Shaw: Ex27-39; Lutz: Ch8-13 Logic and loops, lists and list comprehension, tuples, dictionaries, other types --
8 Python and IPython Review McKinney: Ch1, Ch2, Ch3 Review of Python commands, IPython review Assignment 4: Python Fundamentals II
9 Regular Expressions Kuchling: Regular Expression HOWTO Regular expression syntax, Command-line tools: grep, sed, awk, perl -e, Python examples: built-in and re module --
10 Numpy, Pandas and Matplotlib Crashcourse Pratik: Introduction to Numpy and Pandas Numpy, Pandas, and Matplotlib overview Assignment 5: Regular Expressions
11 Pandas Part I McKinney: Ch4, Ch5 Introduction to NumPy and Pandas: ndarray, Series, DataFrame, index, columns, dtypes, info, describe, read_csv, head, tail, loc, iloc, ix, to_datetime --
12 Pandas Part II McKinney: Ch6, Ch7, Ch8 Data Analysis with Pandas: concat, append, merge, join, set_option, stack, unstack, transpose, dot-notation, values, apply, lambda, sort_index, sort_values, to_csv, read_csv, isnull Assignment 6: Pandas Fundamentals
13 Plotting with Matplotlib McKinney: Ch9; Johansson: Matplotlib 2D and 3D plotting in Python Matplotlib tutorial from J.R. Johansson --
14 Plotting with Seaborn Seaborn Tutorial Seaborn tutorial from Michael Waskom Assignment 7: Plotting
15 Pandas Time Series McKinney: Ch11 Time series data in Pandas --
16 Pandas Group Operations McKinney: Ch10 groupby, melt, pivot, inplace=True, reindex Assignment 8: Time Series and Group Operations
17 Statistics Packages Handbook of Biological Statistics Statistics capabilities of Pandas, Numpy, Scipy, and Scikit-bio --
18 Interactive Visualization with Bokeh Bokeh User Guide Quickstart guide to making interactive HTML and notebook plots with Bokeh Assignment 9: Statistics and Interactive Visualization
19 Modules and Classes Shaw: Ex40-52 Packaging your code so you and others can use it again --
20 Git and GitHub GitHub Guides Sharing your code in a public GitHub repository Final Project
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].