All Projects → KeithGalli → Pandas Data Science Tasks

KeithGalli / Pandas Data Science Tasks

Set of real world data science tasks completed using the Python Pandas library

Projects that are alternatives of or similar to Pandas Data Science Tasks

Tensorflow 2.x Yolov3
YOLOv3 implementation in TensorFlow 2.3.1
Stars: ✭ 300 (-3.54%)
Mutual labels:  jupyter-notebook
Us county level election results 08 20
United States General Election Presidential Results by County from 2008 to 2016
Stars: ✭ 305 (-1.93%)
Mutual labels:  jupyter-notebook
Attention Analysis
Stars: ✭ 307 (-1.29%)
Mutual labels:  jupyter-notebook
Web Data Scraping S2019
Stars: ✭ 305 (-1.93%)
Mutual labels:  jupyter-notebook
Geopandas Tutorial
Tutorial on geospatial data manipulation with Python
Stars: ✭ 306 (-1.61%)
Mutual labels:  jupyter-notebook
Gan Metrics
An empirical study on evaluation metrics of generative adversarial networks.
Stars: ✭ 307 (-1.29%)
Mutual labels:  jupyter-notebook
Randomfun
Notebooks and various random fun
Stars: ✭ 304 (-2.25%)
Mutual labels:  jupyter-notebook
Zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Stars: ✭ 3,307 (+963.34%)
Mutual labels:  jupyter-notebook
Atpbetting
A strategy for tennis matches betting
Stars: ✭ 306 (-1.61%)
Mutual labels:  jupyter-notebook
Vimpyter
Edit your Jupyter notebooks in Vim/Neovim
Stars: ✭ 308 (-0.96%)
Mutual labels:  jupyter-notebook
Gdrl
Grokking Deep Reinforcement Learning
Stars: ✭ 304 (-2.25%)
Mutual labels:  jupyter-notebook
Deepsurv
DeepSurv is a deep learning approach to survival analysis.
Stars: ✭ 303 (-2.57%)
Mutual labels:  jupyter-notebook
A3c trading
Trading with recurrent actor-critic reinforcement learning
Stars: ✭ 305 (-1.93%)
Mutual labels:  jupyter-notebook
Qiskit Community Tutorials
A collection of Jupyter notebooks developed by the community showing how to use Qiskit
Stars: ✭ 298 (-4.18%)
Mutual labels:  jupyter-notebook
Apricot
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
Stars: ✭ 306 (-1.61%)
Mutual labels:  jupyter-notebook
Bayesmadesimple
Code for a tutorial on Bayesian Statistics by Allen Downey.
Stars: ✭ 303 (-2.57%)
Mutual labels:  jupyter-notebook
Covid19 twitter
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
Stars: ✭ 304 (-2.25%)
Mutual labels:  jupyter-notebook
Tensorflow Handbook
简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2 | 一本简明的 TensorFlow 2 入门指导教程
Stars: ✭ 3,616 (+1062.7%)
Mutual labels:  jupyter-notebook
Erlemar.github.io
Data science portfolio
Stars: ✭ 309 (-0.64%)
Mutual labels:  jupyter-notebook
Recsys
项亮的《推荐系统实践》的代码实现
Stars: ✭ 306 (-1.61%)
Mutual labels:  jupyter-notebook

Pandas-Data-Science-Tasks

Set of real world data science tasks completed using the Python Pandas library.

Setup

To access all of the files I recommend you fork this repo and then clone it locally. Instructions on how to do this can be found here: https://help.github.com/en/github/getting-started-with-github/fork-a-repo

The other option is to click the green "clone or download" button and then click "Download ZIP". You then should extract all of the files to the location you want to edit your code.

Installing Jupyter Notebook: https://jupyter.readthedocs.io/en/latest/install.html
Installing Pandas library: https://pandas.pydata.org/pandas-docs/stable/install.html

Background Information:

This repo goes with my video on "Solving real world data science videos with Python Pandas!". Here is some information on that video.

In this video we use Python Pandas & Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.

We start by cleaning our data. Tasks during this section include:

  • Drop NaN values from DataFrame
  • Removing rows based on a condition
  • Change the type of columns (to_numeric, to_datetime, astype)

Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data:

  • What was the best month for sales? How much was earned that month?
  • What city sold the most product?
  • What time should we display advertisemens to maximize the likelihood of customer’s buying product?
  • What products are most often sold together?
  • What product sold the most? Why do you think it sold the most?

To answer these questions we walk through many different pandas & matplotlib methods. They include:

  • Concatenating multiple csvs together to create a new DataFrame (pd.concat)
  • Adding columns
  • Parsing cells as strings to make new columns (.str)
  • Using the .apply() method
  • Using groupby to perform aggregate analysis
  • Plotting bar charts and lines graphs to visualize our results
  • Labeling our graphs

Check out the first video I did on Pandas:
https://youtu.be/vmEHCJofslg

Check out the videos I did on Matplotlib:
https://youtu.be/DAQNHzOcO5A
https://youtu.be/0P7QnIQDBJY

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].