All Projects → justmarkham → Pycon 2018 Tutorial

justmarkham / Pycon 2018 Tutorial

Using pandas for Better (and Worse) Data Science

Projects that are alternatives of or similar to Pycon 2018 Tutorial

Pyssim
A Python module for computing the Structural Similarity Image Metric (SSIM)
Stars: ✭ 290 (-1.36%)
Mutual labels:  jupyter-notebook
Multidimensional Lstm Bitcoin Time Series
Using multidimensional LSTM neural networks to create a forecast for Bitcoin price
Stars: ✭ 289 (-1.7%)
Mutual labels:  jupyter-notebook
Tdc
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (-1.02%)
Mutual labels:  jupyter-notebook
Covid Model
Stars: ✭ 292 (-0.68%)
Mutual labels:  jupyter-notebook
Automated Manual Comparison
Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.
Stars: ✭ 291 (-1.02%)
Mutual labels:  jupyter-notebook
Predict Customer Churn
A general-purpose framework for solving problems with machine learning applied to predicting customer churn
Stars: ✭ 294 (+0%)
Mutual labels:  jupyter-notebook
Sscnet
Semantic Scene Completion from a Single Depth Image
Stars: ✭ 290 (-1.36%)
Mutual labels:  jupyter-notebook
Notebooks
All of our computational notebooks
Stars: ✭ 292 (-0.68%)
Mutual labels:  jupyter-notebook
Ta Lib In Chinese
中文版TA-Lib库使用教程
Stars: ✭ 292 (-0.68%)
Mutual labels:  jupyter-notebook
Dive Into Dl Tensorflow2.0
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为TensorFlow 2.0实现,项目已得到李沐老师的认可
Stars: ✭ 3,380 (+1049.66%)
Mutual labels:  jupyter-notebook
G Darknet
darknet with GIoU
Stars: ✭ 292 (-0.68%)
Mutual labels:  jupyter-notebook
Fpn tensorflow
This is a tensorflow re-implementation of Feature Pyramid Networks for Object Detection.
Stars: ✭ 291 (-1.02%)
Mutual labels:  jupyter-notebook
Neural Networks And Deep Learning
This is my assignment on Andrew Ng's course “neural networks and deep learning”
Stars: ✭ 292 (-0.68%)
Mutual labels:  jupyter-notebook
Mathtoolsforneuroscience
Materials for Mathematical Tools for Neuroscience course at Harvard (Neurobio 212)
Stars: ✭ 287 (-2.38%)
Mutual labels:  jupyter-notebook
Tianchi nl2sql
追一科技首届中文NL2SQL挑战赛决赛第3名方案+代码
Stars: ✭ 290 (-1.36%)
Mutual labels:  jupyter-notebook
Python for data science
A rapid on-ramp primer for programmers who want to learn Python for doing data science research and development.
Stars: ✭ 290 (-1.36%)
Mutual labels:  jupyter-notebook
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (+0%)
Mutual labels:  jupyter-notebook
Generative Query Network Pytorch
Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"
Stars: ✭ 294 (+0%)
Mutual labels:  jupyter-notebook
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+0%)
Mutual labels:  jupyter-notebook
Scientific Python Lectures
Lectures on scientific computing with python, as IPython notebooks.
Stars: ✭ 3,158 (+974.15%)
Mutual labels:  jupyter-notebook

Using pandas for Better (and Worse) Data Science

This tutorial was presented by Kevin Markham at PyCon on May 10, 2018.

Jupyter notebook

The tutorial code is available as a Jupyter notebook. The notebook includes 4 additional exercises that were not covered during the tutorial.

Videos (playlist)

  1. Introducing the dataset (19:40)
  2. Removing columns (6:27)
  3. Comparing groups (8:42)
  4. Examining relationships (8:44)
  5. Handling missing values (5:02)
  6. Using string methods (5:55)
  7. Combining dates and times (9:11)
  8. Plotting a time series (8:48)
  9. Creating useful plots (8:47)
  10. Fixing bad data (16:31)

What is the tutorial about?

The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, proper data science requires careful coding, and pandas will not stop you from creating misleading plots, drawing incorrect conclusions, ignoring relevant data, including misleading data, or executing incorrect calculations.

In this tutorial, you'll perform a variety of data science tasks on a handful of real-world datasets using pandas. With each task, you'll learn how to avoid either a pandas pitfall or a data science pitfall. By the end of the tutorial, you'll be more confident that you're using pandas for good rather than evil!

How well do I need to know pandas to participate?

You will get the most out of this tutorial if you are an intermediate pandas user, since the tutorial will not cover pandas basics. If you are new to pandas or just need a refresher, I recommend watching some videos from my free pandas course. Alternatively, you can review all of the code from my pandas course in this Jupyter notebook.

What datasets are we using?

How do I download the files from GitHub?

Here are three options that will work equally well:

  • If you know how to use git, you can click the green button above and clone the repository.
  • If you know how to open a ZIP file, you can click the green button above and download the repository.
  • If you want to download the files individually, right click on these links and select "Save As": police.csv, ted.csv, tutorial.ipynb.

How can I check that pandas and matplotlib are properly installed?

  1. Move the CSV files into your working directory. (This is usually the directory where you create Python scripts or notebooks.)

  2. Open the Python environment of your choice.

  3. If you're using Jupyter notebook, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    %matplotlib inline
    ri = pd.read_csv('police.csv')
    ted = pd.read_csv('ted.csv')
    ri.driver_age.plot()
    
  4. If you're using any other Python environment, run the following code:

    import pandas as pd
    import matplotlib.pyplot as plt
    ri = pd.read_csv('police.csv')
    ted = pd.read_csv('ted.csv')
    ri.driver_age.plot()
    plt.show()
    

If you don't get any error messages, and a plot appears on your screen, then it's very likely that pandas and matplotlib are installed correctly.

Who is the instructor?

Kevin Markham is the founder of Data School, an online school for learning data science with Python. He is passionate about teaching data science to people who are new to the field, regardless of their educational and professional backgrounds. Previously, Kevin was the lead data science instructor for General Assembly in Washington, DC. Currently, he teaches machine learning and data analysis to over 10,000 students each month through the Data School YouTube channel. He has a degree in Computer Engineering from Vanderbilt University and lives in Asheville, North Carolina with his wife and son.

Can I contact the instructor with questions?

Sure! You can email [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].