Introduction to Machine Learning with Scikit Learn and Python
This repository generates the corresponding lesson website from The Carpentries repertoire of lessons.
Contributing
We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.
We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes.
Please see the current list of issues for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon. Look for the tag . This indicates that the mantainers will welcome a pull request fixing this issue.
Maintainer(s)
Current maintainers of this lesson are:
Outline
As determined by the attendees of CarpentryConnect Manchester 2019, the proposed outline of this lesson is as follows:
Unsupervised Learning
I. Clustering
1. Kmeans
II. Dimesionality Reduction
1. PCA
2. TSNE
Supervised Learning
All models, objectives:
- What it is;
- when to use it and on what type of data;
- how to evaluate the fit, over/underfitting;
- computational complexity
I. Regression
1. Linear
2. Polynomial
- Overfitting/underfitting
- Test sets (how and why)
II. Classification
1. Logistic regression
- Over/underfitting can happen in regression too
- Accuracy
- Confusion Matrix
- Precision
- Recall
2. Random Forest
3. Neural Networks
- Evaluation
- Cross Validation
Ethics
Authors
A list of contributors to the lesson can be found in AUTHORS
Citation
To cite this lesson, please consult with CITATION