All Projects → justmarkham → Pycon 2019 Tutorial

justmarkham / Pycon 2019 Tutorial

Data Science Best Practices with pandas

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pycon 2019 Tutorial

Functional intro to python
[tutorial]A functional, Data Science focused introduction to Python
Stars: ✭ 228 (-44.39%)
Mutual labels:  jupyter-notebook, data-science, pandas, tutorial
Learnpythonforresearch
This repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (-60.24%)
Mutual labels:  jupyter-notebook, data-science, pandas, tutorial
Python Introducing Pandas
Introduction to pandas Treehouse course
Stars: ✭ 24 (-94.15%)
Mutual labels:  jupyter-notebook, data-science, pandas, tutorial
Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (-50.98%)
Mutual labels:  jupyter-notebook, data-science, pandas, tutorial
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+318.54%)
Mutual labels:  jupyter-notebook, data-science, pandas, tutorial
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-52.68%)
Mutual labels:  jupyter-notebook, data-science, tutorial
Data Science Projects With Python
A Case Study Approach to Successful Data Science Projects Using Python, Pandas, and Scikit-Learn
Stars: ✭ 198 (-51.71%)
Mutual labels:  jupyter-notebook, data-science, pandas
Thesemicolon
This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.
Stars: ✭ 345 (-15.85%)
Mutual labels:  jupyter-notebook, data-science, tutorial
Sc17
SuperComputing 2017 Deep Learning Tutorial
Stars: ✭ 211 (-48.54%)
Mutual labels:  jupyter-notebook, data-science, tutorial
Tutorials
AI-related tutorials. Access any of them for free → https://towardsai.net/editorial
Stars: ✭ 204 (-50.24%)
Mutual labels:  jupyter-notebook, data-science, tutorial
50 Days Of Ml
A day to day plan for this challenge (50 Days of Machine Learning) . Covers both theoretical and practical aspects
Stars: ✭ 218 (-46.83%)
Mutual labels:  jupyter-notebook, pandas, tutorial
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (+598.54%)
Mutual labels:  jupyter-notebook, data-science, pandas
Andrew Ng Notes
This is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (-56.1%)
Mutual labels:  jupyter-notebook, data-science, pandas
Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-7.07%)
Mutual labels:  jupyter-notebook, data-science, pandas
Cryptocurrency Analysis Python
Open-Source Tutorial For Analyzing and Visualizing Cryptocurrency Data
Stars: ✭ 278 (-32.2%)
Mutual labels:  jupyter-notebook, data-science, tutorial
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (-30%)
Mutual labels:  jupyter-notebook, data-science, pandas
Programming With Data
🐍 Learn Python and Pandas from the ground up
Stars: ✭ 156 (-61.95%)
Mutual labels:  jupyter-notebook, data-science, pandas
User Machine Learning Tutorial
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Stars: ✭ 393 (-4.15%)
Mutual labels:  jupyter-notebook, data-science, tutorial
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+435.85%)
Mutual labels:  jupyter-notebook, data-science, pandas
Py Quantmod
Powerful financial charting library based on R's Quantmod | http://py-quantmod.readthedocs.io/en/latest/
Stars: ✭ 155 (-62.2%)
Mutual labels:  jupyter-notebook, data-science, pandas

Data Science Best Practices with pandas

This tutorial was presented by Kevin Markham at PyCon on May 2, 2019. Watch the complete tutorial video on YouTube.

Data Science Best Practices with pandas

Jupyter Notebook

The tutorial code is available as a Jupyter notebook. You can run this notebook in the cloud (no installation required) by clicking the "launch binder" button:

Binder

What is the tutorial about?

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task.

In this tutorial, you'll use pandas to answer questions about a real-world dataset. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions.

How well do I need to know pandas to participate?

You will get the most out of this tutorial if you are an intermediate pandas user, since the tutorial does not cover pandas basics.

  • If you are new to pandas, I recommend watching some videos from my free pandas course before the tutorial.
  • If you just need a pandas refresher, I recommend reviewing this Jupyter notebook, which includes all of the code from my pandas course.

What dataset are we using?

ted.csv is the TED Talks dataset from Kaggle Datasets, made available under the CC BY-NC-SA 4.0 license.

How do I download the CSV file from GitHub?

Here are three options that will work equally well:

  • If you want to directly download only the CSV file, right click on the following link and select "Save As": ted.csv.
  • If you know how to use git, you can click the green button above and clone the entire repository.
  • If you know how to open a ZIP file, you can click the green button above and download the entire repository.

What do I need to do before the tutorial?

  1. Make sure that pandas and matplotlib are installed on your computer. (The easiest way to install pandas and matplotlib is by downloading the Anaconda distribution.)
  2. Download the CSV file from this repository.
  3. Read the file into pandas using the read_csv() function to make sure everything is working.

How can I check that pandas and matplotlib are properly installed?

  1. Move the CSV file into your working directory. (This is usually the directory where you create Python scripts or notebooks.)

  2. Open the Python environment of your choice.

  3. If you're using the Jupyter notebook, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    ted = pd.read_csv('ted.csv')
    ted.comments.plot()
    
  4. If you're using any other Python environment, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    ted = pd.read_csv('ted.csv')
    ted.comments.plot()
    plt.show()
    

If you don't get any error messages, and a plot appears on your screen, then it's very likely that pandas and matplotlib are installed correctly.

Who is the instructor?

Kevin Markham is the founder of Data School, an online school for learning data science with Python. He is passionate about teaching data science to people who are new to the field, regardless of their educational and professional backgrounds. Previously, Kevin was the lead data science instructor for General Assembly in Washington, DC. Currently, he teaches machine learning and data analysis to over 10,000 students each month through the Data School YouTube channel. He has a degree in Computer Engineering from Vanderbilt University and lives in Asheville, North Carolina with his wife and son.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].