All Projects → chrislicodes → Udacity-Data-Analyst-Nanodegree

chrislicodes / Udacity-Data-Analyst-Nanodegree

Licence: other
Repository for the projects needed to complete the Data Analyst Nanodegree.

Programming Languages

Jupyter Notebook
11667 projects
HTML
75241 projects

Projects that are alternatives of or similar to Udacity-Data-Analyst-Nanodegree

data-analysis-using-python
Data Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (+161.29%)
Mutual labels:  numpy, pandas, seaborn, data-analytics, data-analysis, matplotlib
Data-Analyst-Nanodegree
Kai Sheng Teh - Udacity Data Analyst Nanodegree
Stars: ✭ 42 (+35.48%)
Mutual labels:  udacity, numpy, pandas, data-analysis, data-wrangling, data-analyst-nanodegree
The-Data-Visualization-Workshop
A New, Interactive Approach to Learning Data Visualization
Stars: ✭ 59 (+90.32%)
Mutual labels:  numpy, pandas, seaborn, matplotlib, data-wrangling
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+14051.61%)
Mutual labels:  numpy, pandas, seaborn, data-analysis, matplotlib
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+25587.1%)
Mutual labels:  numpy, pandas, seaborn, data-analysis, matplotlib
Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (+358.06%)
Mutual labels:  numpy, pandas, data-analysis, matplotlib
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+190.32%)
Mutual labels:  numpy, pandas, data-analytics, data-wrangling
Data Forge Ts
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+3019.35%)
Mutual labels:  pandas, data-analysis, data-wrangling, data-cleaning
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-54.84%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
Stars: ✭ 53 (+70.97%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
Exploratory Data Analysis Visualization Python
Data analysis and visualization with PyData ecosystem: Pandas, Matplotlib Numpy, and Seaborn
Stars: ✭ 78 (+151.61%)
Mutual labels:  numpy, pandas, seaborn, matplotlib
Seaborn Tutorial
This repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (+267.74%)
Mutual labels:  numpy, pandas, data-analysis
Data Science For Marketing Analytics
Achieve your marketing goals with the data analytics power of Python
Stars: ✭ 127 (+309.68%)
Mutual labels:  numpy, pandas, matplotlib
Machine Learning Projects
This repository consists of all my Machine Learning Projects.
Stars: ✭ 135 (+335.48%)
Mutual labels:  numpy, pandas, matplotlib
Stock Market Analysis And Prediction
Stock Market Analysis and Prediction is the project on technical analysis, visualization and prediction using data provided by Google Finance.
Stars: ✭ 112 (+261.29%)
Mutual labels:  numpy, pandas, matplotlib
Ml Cheatsheet
A constantly updated python machine learning cheatsheet
Stars: ✭ 136 (+338.71%)
Mutual labels:  numpy, pandas, matplotlib
Opendatawrangling
공공데이터 분석
Stars: ✭ 148 (+377.42%)
Mutual labels:  numpy, pandas, matplotlib
Data Science Types
Mypy stubs, i.e., type information, for numpy, pandas and matplotlib
Stars: ✭ 180 (+480.65%)
Mutual labels:  numpy, pandas, matplotlib
Data Science Notebook
📖 每一个伟大的思想和行动都有一个微不足道的开始
Stars: ✭ 196 (+532.26%)
Mutual labels:  numpy, pandas, data-analysis
100 Pandas Puzzles
100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)
Stars: ✭ 1,382 (+4358.06%)
Mutual labels:  numpy, pandas, data-analysis

Udacity Data Analyst Nanodegree

Discover insights from data via Python and SQL.

Skills Acquired (Summary)

Prerequisites

You'll need to install:

And additional libraries defined in each project.

Recommended:

Project Overview

P0: Explore Weather Trends

The first chapter was an introduction to the following projects of the Data Analyst Nanodegree.

First chapter project was about weather trends - it required to apply (atleast) the following steps:

  • Extract data from a database using a SQL query
  • Calculate a moving average
  • Create a line chart

I analyzed local and global temperature data and compared the temperature trends in three german cities to overall global temperature trends. After cleaning the data, I've created a function, which was supposed to handle all the tasks that are needed to plot the data - for example calculating the linear trend and calculating the rolling average. In addition, the function had other various options for the visualization to get various graphs.

Key findings:

  • the average global temperature is increasing, with an also increasing tempo
  • Berlin is the only city in Germany in this dataset which has a higher average temperature than the global average

Global Weather Trend

P1: Investigate a Dataset (Gapminder World Dataset)

This chapter was all about the data analysis process as whole. From gathering to cleaning, assessing and wrangling to exploring and visualizing the data over the programming workflow and communication was everything included.

This project included therefore all steps of the typical data analysis process. This includes:

  • posing questions
  • gather, wrangle and clean data
  • communicate answers to the questions
  • assited through visualizations and statistics.

Out of the project:

This project will examine datasets available at Gapminder. To be more specific, it will take a closer look on the life expectancy of the population from different countries and the influences from other variables. It will also take a look on the development of these variables over time.

What is Gapminder? "Gapminder is an independent Swedish foundation with no political, religious or economic affiliations. Gapminder is a fact tank, not a think tank. Gapminder fights devastating misconceptions about global development." (https://www.gapminder.org/about-gapminder/)

Here we were confronted with the full joy of a real-life dataset: from hard-to-analyze structure, missing, messy, dirty data to real and - after finally being done with data wrangling - the reward of interesting insights.

Life Expectancy To Income 2018

P2: Analyze A/B Test Results

Following chapter was filled with a lot of information. We talked about: Data Types, Notation, Mean, Standard Deviation, Correlation, Data Shapes, Outliers, Bias, Dangers, Probability and Bayes, Distributions, Central Limit Theorem, Bootstrapping, Confidence Intervals, Hypothesis Testing, A/B Tests, Linear Regression, Logistic Regression and more.. *heavy breathing

To goal of the project in this chapter was to get experience with A/B testing, it's difficulties and drawbacks of it. First of all, we learned what A/B testing is all about - including different metrics like the Click Through Rate (CTR) and how to analyze these metrics properly. And second of all, we learned about the drawbacks like the novelty effect or change aversion.

In the end we brought everything we've learned together to analyze this A/B test properly.

Sampling distribution

P3: Gather, Clean and Analyze Twitter Data (WeRateDogs™ (@dog_rates))

This chapter was a deep dive into the data wrangling part of the data analysis process. We learned about the difference between messy and dirty data, how tidy data should look like, about the assessing, defining, cleaning and testing process, etc. Moreover, we talked about many different file types and different methods of gathering data.

In this project we had to deal with the reality of dirty and messy data (again). We gathered data from different sources (for example the Twitter API), identified issues with the dataset in terms of tidiness and quality. Afterwards we had to solve these problems while documenting each step. The end of the project was then focused on the exploration of the data.

Mean of retweets

P4: Communicate Data Findings

The final chapter was focused on proper visualization of data. We learned about chart junk, uni-, bi- and multivariate visualization, use of color, data/ink ratio, the lief factor, other encodings, [...].

The task of the final project was to analyze and visualize real-world data. I chose the Ford GoBike dataset.

Relative Userfrequncy by gender and area

License

Creative Commons License
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].