airalcorn2 / Michael S Guide To Becoming A Data Scientist
I was once asked about transitioning to a career in data science by three different UChicago grad students over a short period of time, so I decided to put together this outline in case anyone else was curious.
Stars: ✭ 34
Labels
Projects that are alternatives of or similar to Michael S Guide To Becoming A Data Scientist
Wolfram Coronavirus
Wolfram Language code and notebooks related to the coronavirus outbreak
Stars: ✭ 30 (-11.76%)
Mutual labels: data-science
Mljar Supervised
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+2726.47%)
Mutual labels: data-science
Intro Python
Python pour Statistique et Science des Données -- Syntaxe, Trafic de Données, Graphes, Programmation, Apprentissage
Stars: ✭ 21 (-38.24%)
Mutual labels: data-science
Rebate
Relief Based Algorithms of ReBATE implemented in Python with Cython optimization. This repository is no longer being updated. Please see scikit-rebate.
Stars: ✭ 29 (-14.71%)
Mutual labels: data-science
Docker Iocaml Datascience
Dockerfile of Jupyter (IPython notebook) and IOCaml (OCaml kernel) with libraries for data science and machine learning
Stars: ✭ 30 (-11.76%)
Mutual labels: data-science
Clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Stars: ✭ 887 (+2508.82%)
Mutual labels: data-science
Python Training
Python training for business analysts and traders
Stars: ✭ 972 (+2758.82%)
Mutual labels: data-science
Python for ml
brief introduction to Python for machine learning
Stars: ✭ 29 (-14.71%)
Mutual labels: data-science
Simple Sh Datascience
A collection of Bash scripts and Dockerfiles to install data science Tool, Lib and application
Stars: ✭ 32 (-5.88%)
Mutual labels: data-science
Machine Learning Open Source
Monthly Series - Machine Learning Top 10 Open Source Projects
Stars: ✭ 943 (+2673.53%)
Mutual labels: data-science
Mlnet Workshop
ML.NET Workshop to predict car sales prices
Stars: ✭ 29 (-14.71%)
Mutual labels: data-science
Page clustering
A simple algorithm for clustering web pages, suitable for crawlers
Stars: ✭ 30 (-11.76%)
Mutual labels: data-science
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-38.24%)
Mutual labels: data-science
Crime Analysis
Association Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-41.18%)
Mutual labels: data-science
Arcgis Python Api
Documentation and samples for ArcGIS API for Python
Stars: ✭ 954 (+2705.88%)
Mutual labels: data-science
Open Solution Value Prediction
Open solution to the Santander Value Prediction Challenge 🐠
Stars: ✭ 34 (+0%)
Mutual labels: data-science
Feagen
(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
Stars: ✭ 33 (-2.94%)
Mutual labels: data-science
Tensorflow object counting api
🚀 The TensorFlow Object Counting API is an open source framework built on top of TensorFlow and Keras that makes it easy to develop object counting systems!
Stars: ✭ 956 (+2711.76%)
Mutual labels: data-science
Michael's Guide to Becoming a Data Scientist
Michael's Guide to Becoming a Data Scientist by Michael A. Alcorn is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
I was once asked about transitioning to a career in data science by three different UChicago grad students over a short period of time, so I decided to put together this outline in case anyone else was curious.
Table of Contents
Guide
- My CV
-
General Information
- 8 Skills You Need to be a Data Scientist
-
What's the difference between a data architect, data analyst, data engineer, and data scientist?
- "Data analyst" will probably be less exciting than "data scientist" for those with a scientific background.
- Advice from a Data Scientist at Quora
- /r/MachineLearning
-
Get Experience!
- Intern - this is the best possible thing you can do.
- Try out Kaggle competitions.
- Create a LinkedIn account and keep it updated.
-
Curriculum
- Free Courses - use them
- Coursera, edX, Udacity, Saylor, Khan Academy
- Can use my course history as a guide.
- Math
- Calculus (at least up to partial derivatives, which is typically Calculus III)
- Linear Algebra
- Analysis (advanced)
- Statistics - know Bayesian and frequentist theory
- Algorithms
- Machine Learning - know the big algorithms; natural language processing is probably the most useful subfield to learn
- Other Topics - graphs, game theory, information theory, etc.
- Free Courses - use them
-
Programming
- Must know Python. Almost all data scientist positions require cleansing and transforming data on a large scale and Python is typically the language of choice for this task.
- Important Python packages/libraries → scikit-learn, NumPy, Keras, TensorFlow, Theano, SciPy, Pandas, Statsmodels
- Must know R.
- Should know your way around a *nix terminal.
- Version control - should know basics of Git.
- Put personal projects on GitHub.
- Contribute to open source projects.
-
Databases - definitely know SQL, should probably look into NoSQL databases as well (e.g., MongoDB)
- The best way to learn databases is by working with them. Find a database and practice writing queries for it.
-
Big Data Tools
- Be familiar with the following: Apache Hadoop, MapReduce, Apache Spark, Apache Pig, Apache Hive, Apache Mahout, Apache Solr, Apache Lucene
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].