All Projects → stefmolin → pandas-workshop

stefmolin / pandas-workshop

Licence: MIT license
An introductory workshop on pandas with notebooks and exercises for following along.

Programming Languages

HTML
75241 projects
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to pandas-workshop

datatile
A library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+160.25%)
Mutual labels:  pandas, data-analysis, dataframes
Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-80.75%)
Mutual labels:  pandas, data-analysis, data-wrangling
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-88.2%)
Mutual labels:  pandas, data-analysis, data-wrangling
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (-73.91%)
Mutual labels:  pandas, data-analysis, data-wrangling
whyqd
data wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-90.06%)
Mutual labels:  pandas, data-analysis, data-wrangling
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+500.62%)
Mutual labels:  pandas, data-analysis, data-wrangling
data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (-49.69%)
Mutual labels:  pandas, data-analysis, pandas-tutorial
Data Forge Js
JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (-13.66%)
Mutual labels:  pandas, data-analysis, data-wrangling
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (+21.74%)
Mutual labels:  pandas, data-analysis
Awkward 1.0
Manipulate JSON-like data with NumPy-like idioms.
Stars: ✭ 203 (+26.09%)
Mutual labels:  pandas, data-analysis
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+44.1%)
Mutual labels:  pandas, data-analysis
Zebras
Data analysis library for JavaScript built with Ramda
Stars: ✭ 192 (+19.25%)
Mutual labels:  pandas, data-analysis
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+45.96%)
Mutual labels:  pandas, data-analysis
Missingno
Missing data visualization module for Python.
Stars: ✭ 3,019 (+1775.16%)
Mutual labels:  pandas, data-analysis
kobe-every-shot-ever
A Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (-59.01%)
Mutual labels:  pandas, data-analysis
Edaviz
edaviz - Python library for Exploratory Data Analysis and Visualization in Jupyter Notebook or Jupyter Lab
Stars: ✭ 220 (+36.65%)
Mutual labels:  pandas, data-analysis
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (+1678.88%)
Mutual labels:  pandas, data-analysis
Pandas Datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (+1255.9%)
Mutual labels:  pandas, data-analysis
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-89.44%)
Mutual labels:  pandas, data-analysis
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-91.93%)
Mutual labels:  pandas, data-analysis

Pandas Workshop

Binder Nbviewer View slides in browser

Working with data can be challenging: it often doesn’t come in the best format for analysis, and understanding it well enough to extract insights requires both time and the skills to filter, aggregate, reshape, and visualize it. This session will equip you with the knowledge you need to effectively use pandas – a powerful library for data analysis in Python – to make this process easier.

Pandas makes it possible to work with tabular data and perform all parts of the analysis from collection and manipulation through aggregation and visualization. While most of this session focuses on pandas, during our discussion of visualization, we will also introduce at a high level Matplotlib (the library that pandas uses for its visualization features, which when used directly makes it possible to create custom layouts, add annotations, etc.) and Seaborn (another plotting library, which features additional plot types and the ability to visualize long-format data).

Session Outline

This is an introductory workshop on pandas first delivered at ODSC Europe 2021 and subsequently at the 5th Annual Toronto Machine Learning Summit in 2021 and PyCon US 2022. It's divided into the following sections:

Section 1: Getting Started With Pandas

We will begin by introducing the Series, DataFrame, and Index classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter the data.

Section 2: Data Wrangling

To prepare our data for analysis, we need to perform data wrangling. In this section, we will learn how to clean and reformat data (e.g., renaming columns and fixing data type mismatches), restructure/reshape it, and enrich it (e.g., discretizing columns, calculating aggregations, and combining data sources).

Section 3: Data Visualization

The human brain excels at finding patterns in visual representations of the data; so in this section, we will learn how to visualize data using pandas along with the Matplotlib and Seaborn libraries for additional features. We will create a variety of visualizations that will help us better understand our data.

Section 4: Hands-On Data Analysis Lab

We will practice all that you’ve learned in a hands-on lab. This section features a set of analysis tasks that provide opportunities to apply the material from the previous sections.


Prerequisites

You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab.


Setup Instructions

  1. Install Python >= version 3.8.0 and <= version 3.10.2 OR install Anaconda/Miniconda. Note that Anaconda/Miniconda is recommended if you are working on a Windows machine and are not very comfortable with the command line. Alternatively, you can use this Binder environment if you don't want to install anything on your machine.

  2. Fork this repository:

    location of fork button in GitHub

  3. Clone your forked repository:

    location of clone button in GitHub

  4. Create and activate a Python virtual environment:

    • If you installed Anaconda/Miniconda, use conda (on Windows, these commands should be run in Anaconda Prompt):

      $ cd pandas-workshop
      ~/pandas-workshop$ conda env create --file environment.yml
      ~/pandas-workshop$ conda activate pandas_workshop
      (pandas_workshop) ~/pandas-workshop$
    • Otherwise, use venv:

      $ cd pandas-workshop
      ~/pandas-workshop$ python3 -m venv pandas_workshop
      ~/pandas-workshop$ source pandas_workshop/bin/activate
      (pandas_workshop) ~/pandas-workshop$ pip3 install -r requirements.txt
  5. Launch JupyterLab:

    (pandas_workshop) ~/pandas-workshop$ jupyter lab
  6. Navigate to the 0-check_your_env.ipynb notebook in the notebooks/ folder:

    open 0-check_your_env.ipynb

  7. Run the notebook to confirm everything is set up properly:

    check env


About the Author

Stefanie Molin (@stefmolin) is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of Hands-On Data Analysis with Pandas, which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Related Content

All examples herein were developed exclusively for this workshop. Hands-On Data Analysis with Pandas contains additional examples and exercises, as does this blog post. For a deeper dive into data visualization in Python, check out my Beyond the Basics: Data Visualization in Python workshop.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].