All Projects → mebauer → data-analysis-using-python

mebauer / data-analysis-using-python

Licence: MIT License
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to data-analysis-using-python

Udacity-Data-Analyst-Nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Stars: ✭ 31 (-61.73%)
Mutual labels:  numpy, pandas, seaborn, data-analytics, data-analysis, matplotlib
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+5316.05%)
Mutual labels:  numpy, pandas, seaborn, data-analysis, matplotlib
Py
Repository to store sample python programs for python learning
Stars: ✭ 4,154 (+5028.4%)
Mutual labels:  numpy, pandas-dataframe, pandas, python-tutorial, pandas-tutorial
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+9730.86%)
Mutual labels:  numpy, pandas, seaborn, data-analysis, matplotlib
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (-3.7%)
Mutual labels:  numpy, exploratory-data-analysis, pandas, seaborn, matplotlib
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (-34.57%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+2185.19%)
Mutual labels:  pandas-dataframe, exploratory-data-analysis, pandas, data-analysis
The-Data-Visualization-Workshop
A New, Interactive Approach to Learning Data Visualization
Stars: ✭ 59 (-27.16%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+10182.72%)
Mutual labels:  pandas-dataframe, exploratory-data-analysis, pandas, data-analysis
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+237.04%)
Mutual labels:  numpy, pandas-dataframe, pandas, data-analysis
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+75.31%)
Mutual labels:  numpy, pandas, data-analysis, matplotlib
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-82.72%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (-48.15%)
Mutual labels:  numpy, pandas, data-analysis
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-76.54%)
Mutual labels:  exploratory-data-analysis, pandas, data-analysis
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+98.77%)
Mutual labels:  pandas, data-analysis, pandas-tutorial
cracking-the-pandas-cheat-sheet
인프런 - 단 두 장의 문서로 데이터 분석과 시각화 뽀개기
Stars: ✭ 62 (-23.46%)
Mutual labels:  pandas-dataframe, pandas, pandas-tutorial
introduction to ml with python
도서 "[개정판] 파이썬 라이브러리를 활용한 머신 러닝"의 주피터 노트북과 코드입니다.
Stars: ✭ 211 (+160.49%)
Mutual labels:  numpy, pandas, matplotlib
Python-Data-Visualization
D-Lab's 3 hour introduction to data visualization with Python. Learn how to create histograms, bar plots, box plots, scatter plots, compound figures, and more, using matplotlib and seaborn.
Stars: ✭ 42 (-48.15%)
Mutual labels:  pandas, seaborn, matplotlib
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (-56.79%)
Mutual labels:  pandas, data-analysis, matplotlib
Python-for-data-analysis
No description or website provided.
Stars: ✭ 18 (-77.78%)
Mutual labels:  numpy, pandas, matplotlib

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data

Mark Bauer

The recording for this presentation can be viewed here: YouTube Video Views

Table of Contents

1. Introduction

NYC Open Data provides a treasure-trove of information - all publicly available with a click of a button. While having access to data is great, its analysis is often a difficult process for beginners, potentially creating barriers in one's open data journey. Additionally, performing data analysis in a reproducible way is often limited or even discarded altogether.

Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data is a four-part series as listed in the sections below. These collection of notebooks serve as references/user guides for how to apply Python to real-world Data Analysis projects. The repository features notebooks that will utilize the Python programming language and datasets from NYC Open Data. This series exemplifies how data analytics can be used for discovering useful information and supporting decision-making.

Sections include:

Part 1: Reading and Writing Files in Python

Part 2: Data Inspection, Cleaning, and Wrangling in Python]

Part 3: Plotting and Data Visualization in Python

Part 4: Geospatial Data and Mapping

You can run an interactive example on MyBinder through your browser - no installation required: click here Binder. Loading MyBinder is a bit slow and takes ~5 minutes, but it will load eventually.

2. Notebooks

You can view these notebooks through your browser by clicking View under the Static Webpage column.

File Name Description Static Webpage
1_reading_writing_files.ipynb Reading and Writing Files. View
2_data_inspection_cleaning_wrangling.ipynb Data Inspection, Cleaning, and Wrangling. View
3_plotting_visualizations.ipynb Plotting and Data Visualization. View
4_geospatial_data_mapping.ipynb Geospatial Data and Mapping. View

3. Data

Dataset Description
Building Footprints Shapefile of footprint outlines of buildings in New York City.
MapPLUTO MapPLUTO merges PLUTO tax lot data with tax lot features from the Department of Finance’s Digital Tax Map (DTM) and is available as shoreline clipped and water included. It contains extensive land use and geographic data at the tax lot level in ESRI shapefile and File Geodatabase formats.
Schools This is an ESRI shape file of school point locations based on the official address. It includes some additional basic and pertinent information needed to link to other data sources. It also includes some basic school information such as Name, Address, Principal, and Principal’s contact information.
Streets The NYC Street Centerline (CSCL) is a road-bed representation of New York City streets containing address ranges and other information such as traffic directions, road types, segment types.
Neighborhood Tabulation Areas (NTA) Boundaries of Neighborhood Tabulation Areas as created by the NYC Department of City Planning using whole census tracts from the 2010 Census as building blocks. These aggregations of census tracts are subsets of New York City's 55 Public Use Microdata Areas (PUMAs).
NYC Boroughs GIS data: Boundaries of Boroughs (water areas excluded).

4. Open Source Applications Used in Project

  • Anaconda: A distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.
  • Project Jupyter: Project Jupyter is a non-profit, open-source project, born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.
    • Jupyter Notebook: The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
    • nbviewer: A web application that lets you enter the URL of a Jupyter Notebook file, renders that notebook as a static HTML web page, and gives you a stable link to that page which you can share with others.
    • Binder: The Binder Project is an open community that makes it possible to create sharable, interactive, reproducible environments.

5. Additional Resources

Say Hello!

I can be reached at:

Twitter: markbauerwater
LinkedIn: markebauer
GitHub: mebauer

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].