All Projects → ryansmccoy → spreadsheets-to-dataframes

ryansmccoy / spreadsheets-to-dataframes

Licence: MIT License
Pycon 2021 Tutorial to help Spreadsheet (Excel) Users learn Python

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to spreadsheets-to-dataframes

Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (+570%)
Mutual labels:  pandas, requests
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+1306.67%)
Mutual labels:  excel, pandas
Ni Pyt
Materiály k předmětu NI-PYT na FIT ČVUT
Stars: ✭ 112 (+273.33%)
Mutual labels:  pandas, requests
excel-to-python-course
Student materials and handouts for Excel to Python course
Stars: ✭ 73 (+143.33%)
Mutual labels:  excel, pandas
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-53.33%)
Mutual labels:  pandas, requests
Place2live
Analysis of the characteristics of different countries
Stars: ✭ 30 (+0%)
Mutual labels:  pandas, requests
Meza
A Python toolkit for processing tabular data
Stars: ✭ 374 (+1146.67%)
Mutual labels:  excel, pandas
Django Rest Pandas
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)
Stars: ✭ 1,030 (+3333.33%)
Mutual labels:  excel, pandas
Styleframe
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
Stars: ✭ 252 (+740%)
Mutual labels:  excel, pandas
Ditching Excel For Python
Functionalities in Excel translated to Python
Stars: ✭ 172 (+473.33%)
Mutual labels:  excel, pandas
ark invest
daily report of @ARKInvest ETF activity + data collection
Stars: ✭ 25 (-16.67%)
Mutual labels:  pandas, requests
PandasVersusExcel
Python数据分析入门,数据分析师入门
Stars: ✭ 120 (+300%)
Mutual labels:  excel, pandas
DadosAbertosBrasil
Pacote Python para acesso a dados abertos e APIs do governo brasileiro.
Stars: ✭ 28 (-6.67%)
Mutual labels:  pandas, requests
Web-crawler-engineer-for-Python
Web-crawler-engineer-for-Python
Stars: ✭ 42 (+40%)
Mutual labels:  requests
winmerge2011
Fork of WinMerge which has a different set of features
Stars: ✭ 36 (+20%)
Mutual labels:  excel
lisp-xl
Common Lisp Microsoft XLSX (Microsoft Excel) loader for arbitrarily-sized / big-size files
Stars: ✭ 27 (-10%)
Mutual labels:  excel
Interactive-Data-Visualization-with-Python
Present your data as an effective and compelling story
Stars: ✭ 71 (+136.67%)
Mutual labels:  pandas
streamlit-pandas-profiling
Pandas profiling component for Streamlit.
Stars: ✭ 135 (+350%)
Mutual labels:  pandas
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (+160%)
Mutual labels:  pandas
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (-23.33%)
Mutual labels:  pandas

From Spreadsheets to DataFrames: Escaping Excel Hell with Python


Pycon 2021 Tutorial Video [YouTube] - May 12, 2021

Other Presentations:

STL Python Presentation [YouTube]

Chicago Python Users Group [YouTube]

Details

A spreadsheet is a wonderful invention and an excellent tool for certain jobs. All too often, however, spreadsheets are called upon to perform tasks that are beyond their capabilities. It’s like the old saying, 'If the only tool you have is a hammer, every problem looks like a nail.' However, some problems are better addressed with a screwdriver, with glue, or with a Swiss Army Knife.

Python is described by some in the programming world as the Swiss Army Knife of programming languages because of its unrivaled versatility and flexibility in use. This allows its users to solve complex problems relatively easily compared with other programming languages and is one of the reasons why Python has become increasingly popular over time.

In this tutorial, we’ll briefly discuss spreadsheets, signs that you might be living in “Excel Hell”, and then we’ll spend the rest of the time learning how to escape it using Python.

In the first section, we’ll extend on what spreadsheet users already know about cells, rows, columns, and formulas, and map them to their Python equivalent, such as variables, lists, dictionaries, and functions. At the end of this section, we’ll do an interactive exercise and learn how we can perform a simple calculation, similar to one you might do in Excel, but instead using Python.

In the second section, we’ll discuss (and attempt) how we can perform more complex tasks including web scraping, data processing, analysis, and visualization, by utilizing a few popular 3rd party libraries used including Requests, Pandas, Flask, Matplotlib, and others.

In the last section, we’ll round out our discussion with a few important concepts in data management, including concept of tidy data, building a data pipeline, and a few strategies (and packages) to use when approaching various data problems, including demo using Apache Airflow.

Slides

Intro [Slides]

Excel to Python [Slides]

Python Libraries & Resources [Slides]

Data Management [Slides]

Tutorial Code

Section 1 - Python Fundamentals for an Excel User

01 basics_but_important_stuff.ipynb

02 files_lists_dictionaries.ipynb

Section 1 - Challenges

challenge_1.py

challenge_1_answer.py

challenge_2.py

challenge_2_answer.py

challenge_3.py

challenge_3_answer.py

Section 2 - Real-World Python Example for an Excel User

01-real-world-example.py

02-real-world-example-refactored.py

Section 2 - Challenge

section2_challenge.rst

Section 3 - Best Practices in Python & Data for an Excel User

Data Management [Slides]

07-airflow

STL Python - Talk Code

01-basics.ipynb

02-webscraping.ipynb

03-tidy-data.ipynb

04-pandas.ipynb

05-data-analysis.ipynb

06-data-visualizations.ipynb

STL Python - Folders

  • 01-basics - examples used in presentation
  • 02-webscraping - program that clicks through a calendar (written in javascript) and exports csv files
  • 02-selenium-safari - program that logins to website, scrapes html from javascript generated page, cleans html, and exports to pdf files
  • 02-webscrape-celery - example of that uses a message queue, and celery to download list of urls
  • 04-other-analysis - examples of different quantitative notebooks
  • 05-other-visualizations - examples of different data visualization tools
  • 06-flask - different flask examples
  • 07-airflow - example that uses airflow to download and store stock prices

Quick Start Guides

Install Anaconda & Pycharm

  • Anaconda = manages your Python environments
  • Pycharm = code editor

Install Git - Allows you to git clone/download Github Projects'

Setup Environment & Run Example (Windows):

$ git clone https://github.com/ryansmccoy/spreadsheets-to-dataframes.git
$ cd spreadsheets-to-dataframes
$ conda create -n spreadsheets-to-dataframes python=3.8 pandas scipy numpy lxml jupyter matplotlib -y
$ activate spreadsheets-to-dataframes
$ pip install -r requirements_dev.txt

Setup Environment & Run Example (Linux):

$ git clone https://github.com/ryansmccoy/spreadsheets-to-dataframes.git
$ cd spreadsheets-to-dataframes
$ conda create -n spreadsheets-to-dataframes python=3.8 pandas scipy numpy lxml jupyter matplotlib -y
$ source activate spreadsheets-to-dataframes
$ pip install -r requirements_dev.txt

Running Jupyter Notebooks:

Navigate to spreadsheet-to-dataframe directory/folder:

$ activate spreadsheets-to-dataframes
$ jupyter notebook

(Optional) Install Docker to Run Airflow Example

https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html

Python Books & Videos:

(Book) Python Crash Course, 2nd Edition

(Book) Introducing Python: Modern Computing in Simple Packages

(Book) Learning Python, 5th Edition

(Book) Automate the Boring Stuff with Python, 2nd Edition: Practical Programming for Total Beginners

(Book) Think Python: How to Think Like a Computer Scientist

(Book) The Quick Python Book (Book)

(Book) Serious Python: Black-Belt Advice on Deployment, Scalability, Testing, and More

(Github) A Whirlwind Tour of Python

(Github) Python Data Science Handbook

(Github) Introduction to Python

Cookiecutter:

$ pip install cookiecutter

Resources:

https://github.com/cookiecutter/cookiecutter

https://github.com/audreyfeldroy/cookiecutter-pypackage

https://towardsdatascience.com/cookiecutter-creating-custom-reusable-project-templates-fc85c8627b07

Requests

$ pip install requests

Resources:

https://python.readthedocs.io/en/stable/library/stdtypes.html

https://realpython.com/python-requests/

Have you mastered Requests? Then you should check out multithreading, concurrency, asyncio, message queues, parallelism.

https://yasoob.me/2019/05/29/speedingw-up-python-code-using-multithreading/

https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python

https://creativedata.stream/multi-threading-api-requests-in-python/

https://levelup.gitconnected.com/asynchronous-tasks-in-python-with-celery-rabbitmq-redis-480f6e506d76

https://tests4geeks.com/blog/python-celery-rabbitmq-tutorial/

https://codeburst.io/automated-web-scraping-with-python-and-celery-ac02a4a9ce51

https://github.com/ryansmccoy/zmq-high-speed-subs

Pandas

$ pip install pandas

Resources:

Dealing With Data

Pandas Cookbook

brandon-rhodespycon-pandas-tutorial

Python pandas Q&A video series

Master Data Analysis with Python

Have you mastered Pandas? Then you check out Dask and Spark.

https://dask.org/

https://spark.apache.org/docs/latest/api/python/

Visualization:

$ pip install matplotlib

Resources:

https://github.com/fasouto/awesome-dataviz

https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html

https://www.toptal.com/designers/data-visualization/data-visualization-tools

https://realpython.com/pandas-plot-python/

Have you mastered Matplotlilb? Then you should checkout Javascript, D3, React, Tableau

Flask:

$ pip install flask

Resources:

https://www.fullstackpython.com/flask.html

https://blog.miguelgrinberg.com/

Have you mastered Flask? Then you should checkout FastAPI, Javascript, Node, React

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].