Dataquest Guided Projects
This repository is a collection of my projects from Dataquest.io.
The projects below will serve as reference notes for myself and anyone else who is interested. These projects contain lots of comments showing my thought process and what I personally learned from each project.
Exploring US births
Project #1:Concepts explored: lists, dictionaries, functions, for loops
Functions, methods, and properties used: .read(), open(), .split(), .append(), int()
Exploring Gun Deaths in the US
Project #2:Concepts explored: list comprehension, datetime module, csv module
Functions, methods, and properties used: csv.reader(), .items(), list(), datetime.datetime()
Analyzing Thanksgiving Dinner
Project #3:Concepts explored: pandas, functions, boolean filtering
Functions, methods, and properties used: .read_csv(), .pivot_table(), .replace(), .describe(), .apply(), .isnull(), .columns, .shape, .head()
Visualizing Earnings Based On College Majors
Project #4:Concepts explored: pandas, matplotlib, histograms, bar charts, scatterplots, scatter matrices
Functions, methods, and properties used: .plot(), scatter_matrix(), hist(), iloc[], .head(), .tail(), .describe()
Visualizing The Gender Gap In College Degrees
Project #5:Concepts explored: pandas, matplotlib, histograms, line plots, chart graphics
Functions, methods, and properties used: .savefig(), .text(), .axhline(), .set_yticks(), .tick_params(), .set_title(), .set_ylim(), .set_xlim(), .spines(), .tick_params()
Analyzing NYC High School Data
Project #6:Concepts explored: pandas, matplotlib.pyplot, correlations, regex, basemap, data analysis, string manipulation
Functions, methods, and properties used: .scatter(), info(), .tolist(), .groupby(), .agg(), .concat(), .apply(), .strip, .merge(), .fillna(), .corr()
Star Wars Survey
Project #7:Concepts explored: pandas, matplotlib.pyplot, data cleaning, string manipulation, bar plots
Functions, methods, and properties used: .read_csv(), .columns, notnull, map(), .dtypes, .rename, astype(), .mean(), .sum(), .xlabel(), .ylabel()
Working with Data Downloads
Project #8:Concepts explored: pandas, manipulating files with command line
Transforming Data with Python
Project #9:Concepts explored: pandas, manipulating files with command line, working with multiple python scripts, dateutil.parser
Analyzing CIA Factbook
Project #10:Python/SQL concepts explored: python+sqlite3, pandas, SQL queries, SQL subqueries, matplotlib.plyplot, seaborn, histograms
Functions, methods, and properties used: .cursor(), .read_sql_query(), .set_xlabel(), .set_xlim(), .add_subplot(), .figure()
SQL statements used: SELECT, WHERE, FROM, MIN(), MAX(), ORDER BY, AND
Preparing data for SQLite
Project #11:Python/SQL concepts explored: python+sqlite3, pandas, data cleaning, columns manipulation
Functions, methods, and properties used: .str.rstrip(), .str.split(), .connect(), .cursor(), .drop(), .str[], .map(), .value_counts()
SQL statements used: SELECT, FROM, PRAGMA
Creating Relations in SQLite
Project #12:Python/SQL concepts explored: python+sqlite3, pandas, multiple tables, foreign keys, subqueries, populating new tables
Functions, methods, and properties used: .cursor(), .connect(), .execute(), .fetchall(), .executemany()
SQL statements used: PRAGMA, LIMIT, FROM, SELECT, INNER JOIN, DROP, ALTER, VALUES
Analyzing Movie Reviews
Project #13:Concepts explored: pandas, descriptive statistics, numpy, matplotlib, scipy, correlations
Functions and methods used: .sort_values(), sci.linregress(), .hist(), .absolute(), .mean(), .median(), .absolute()
Winning Jeopardy
Project #14:Concepts explored: pandas, matplotlib, data cleaning, string manipulation, chi squared test, regex, try/except
Functions, methods, and properties used: .columns, .lower(), .sub(), .apply(), sum(), .array(), .split(), .shape, .mean(), .iterrows(), .remove(), .add(), .append()
Predicting Car Prices
Project #15:Concepts explored: Concepts explored: pandas, matplotlib, data cleaning, features engineering, k-nearest neighbors, hyperparameter tuning, RMSE
Functions and methods used: .read_csv(), .replace(), .drop(), .astype(), isnull().sum(), .min(), .max(), .mean(), .permutation(), .reindex(), .iloc[], .fit(), .predict(), mean_squared_error(), .Series(), .sort_values(), .plot(), .legend()
Predicting House Sale Prices
Project #16:Concepts explored: pandas, data cleaning, features engineering, linear regression, hyperparameter tuning, RMSE, KFold validation
Functions, methods, and properties used: .dtypes, .value_counts(), .drop, .isnull(), sum(), .fillna(), .sort_values(), . corr(), .index, .append(), .get_dummies(), .astype(), predict(), .fit(), KFold(), mean_squared_error()
Predicting the Stock Market
Project #17:Concepts explored: linear regression, mean squared error, categorical features, datetime
Functions, methods, and properties used: .read_csv(), .to_datetime(), .sort_values(), .rolling(), .apply(), .concat(), .get_dummies(), .shift(), datetime(), .fit(), .predict(), mean_squared_error()
Predicting Bike Rentals
Project #18:Concepts explored: pandas, matplotlib, features engineering, linear regression, decision trees, random forests, MSE
Functions, methods, and properties used:.hist(), .apply(), .corr(), .columns, .drop(), .sample(), .index, .floor(),.fit() .predict(), .mean_squared_error(), .append()
Investigating Airplane Accidents
Project #19:Concepts explored: Big O notation, strings, dictionaries, data parsing, try/except
Functions, methods, and properties used: range(), .append(), .split(), .values(), Counter()
Working with Spark in Jupyter Notebook
Project #20:Concepts explored: Spark
PySpark methods used: .map(), .flatMap(), .filter(), .count(), .collect(), .take()
Working with Spark Dataframes and Spark SQL in Jupyter
Project #21:Concepts explored: Spark SQL, Spark Dataframes, combining data from multiple files
Methods and functions used: .SQLContext(), .head(), .toPandas(), .show(), .select(), .hist(), .registerTempTable()