Python Machine Learning Snippets
Python Machine Learning Snippets is my ongoing pet project where I try out different machine learning models. This project contains various machine learning examples as Jupyter notebooks with scikit-learn, statsmodel, numpy and other libraries.
Note: This is an ongoing project and far away from complete.
Getting Started
Project Setup
All the required Python packages can be installed with pipenv
.
pip install --user pipenv
Install all the required packages
$ pipenv install --dev
Note: To run the tests, export the notebooks or more details see BUILD.md
Run the Notebook
You can start jupyter-lab
to play around with the Juypter notebooks.
pipenv run jupyter-lab
Upgrade Python Packages
Check which packages have changed.
pipenv update --outdated
This will upgrade everything.
pipenv update
The Snippets...
The following machine learning snippets are available as Jupyter Notebook.
Basics
Classification
Text
- Text classification with naive bayes (scikit-learn)
Linear
- Classification with logistic regression (scikit-learn)
- Classification with ridge regression (scikit-learn)
- Classification with stochastic gradient descent (SGD) (scikit-learn)
SVM
- Classification with SVM (scikit-learn)
Non-parametric (nonlinear)
- Classification with k-NN (scikit-learn)
- Classification with decision trees (scikit-learn)
Ensemble learning
- Classification with random forest (scikit-learn)
- Classification with extra-trees (scikit-learn)
- Classification with bagging (scikit-learn)
- Classification with AdaBoost (boosting) (scikit-learn)
- Classification with gradient boosting (xgboost)
Neural network
- Classification with a neural network (tensorflow / keras)
Regression
Linear
- Linear regression with sklearn (OLS) (scikit-learn)
- Linear regression with statsmodels (OLS) (statsmodels)
- Lasso Regression (scikit-learn)
- Ridge Regression (scikit-learn)
- Regression with stochastic gradient descent (scikit-learn)
SVM
- Regression with SVM (scikit-learn)
Non-parametric (nonlinear)
- Regression with k-NN (scikit-learn)
- Regression with decision tree (scikit-learn)
Ensemble learning
- Regression with random forest (scikit-learn)
- Regression with extra-trees (scikit-learn)
- Regression with bagging (scikit-learn)
- Regression with AdaBoost (boosting) (scikit-learn)
- Regression with gradient boosting (xgboost)
Neural network
- Regression with a neural network (tensorflow / keras)
Clustering
Text & model evaluation
- Text clustering basics (scikit-learn)
- Clustering basics and model evaluation (scikit-learn)
Centroid-based clustering
- K-means (scikit-learn)
Density-based clustering
Connectivity based clustering
- Agglomerative Clustering (Hierarchical Clustering) (scikit-learn)
- Hierarchical Clustering (SciPy)
Distribution-based clustering
- Gaussian Mixture Model (scikit-learn)
Dimension reduction
Linear
- PCA with SVD (scikit-learn)
- PCA with Eigenvector and Correlation Matrix (numpy)
Nonlinear (Manifold learning)
Hyperparameter optimization
- Hyperparameter optimization with GridSearch (scikit-learn)
AutoML
Classification
- Classification with AutoML (auto-sklearn)
Regression
- Regression with AutoML (auto-sklearn)
Autoencoder
- Anomaly detection with an Autoencoder (tensorflow / keras)
Transfer learning & pre-trained models
- Pre-trained model ResNet (tensorflow / keras)
- Example style transfer with a neural net (tensorflow / keras)