Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (-13.27%)

Mutual labels: scikit-learn

Eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

Stars: ✭ 2,477 (+996.02%)

Mutual labels: scikit-learn

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (-19.47%)

Mutual labels: scikit-learn

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (-3.54%)

Mutual labels: scikit-learn

Explainx

Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.

Stars: ✭ 196 (-13.27%)

Mutual labels: scikit-learn

Hydro Serving

MLOps Platform

Stars: ✭ 213 (-5.75%)

Mutual labels: scikit-learn

Bet On Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Stars: ✭ 190 (-15.93%)

Mutual labels: scikit-learn

Imodels

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Stars: ✭ 194 (-14.16%)

Mutual labels: scikit-learn

Sklearn Onnx

Convert scikit-learn models and pipelines to ONNX

Stars: ✭ 206 (-8.85%)

Mutual labels: scikit-learn

Virgilio

Virgilio is developed and maintained by these awesome people. You can email us virgilio.datascience (at) gmail.com or join the Discord chat.

Stars: ✭ 13,200 (+5740.71%)

Mutual labels: scikit-learn

Auto viml

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Stars: ✭ 216 (-4.42%)

Mutual labels: scikit-learn

Practical Machine Learning With Python

Machine Learning Tutorials in Python

Stars: ✭ 183 (-19.03%)

Mutual labels: scikit-learn

Lale

Library for Semi-Automated Data Science

Stars: ✭ 198 (-12.39%)

Mutual labels: scikit-learn

Deeplearning cv notes

📓 deepleaning and cv notes.

Stars: ✭ 223 (-1.33%)

Mutual labels: scikit-learn

Spherecluster

Clustering routines for the unit sphere

Stars: ✭ 219 (-3.1%)

Mutual labels: scikit-learn

Hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.

Stars: ✭ 2,704 (+1096.46%)

Mutual labels: scikit-learn

View All Similar Projects ➔

SVM MNIST digit classification in python using scikit-learn

The project presents the well-known problem of MNIST handwritten digit classification. For the purpose of this tutorial, I will use Support Vector Machine (SVM) the algorithm with raw pixel features. The solution is written in python with use of scikit-learn easy to use machine learning library.

The goal of this project is not to achieve the state of the art performance, rather teach you how to train SVM classifier on image data with use of SVM from sklearn. Although the solution isn't optimized for high accuracy, the results are quite good (see table below).

If you want to hit the top performance, this two resources will show you current state of the art solutions:

The table below shows some results in comparison with other models:

Method	Accuracy	Comments
Random forest	0.937
Simple one-layer neural network	0.926
Simple 2 layer convolutional network	0.981
SVM RBF	0.9852	C=5, gamma=0.05
Linear SVM + Nystroem kernel approximation
Linear SVM + Fourier kernel approximation

Project Setup

This tutorial was written and tested on Ubuntu 18.10. Project contains the Pipfile with all necessary libraries

Python - version >= 3.6
pipenv - package and virtual environment management
numpy
matplotlib
scikit-learn

Install Python.
Install pipenv
Git clone the repository
Install all necessary python packages executing this command in terminal

git clone https://github.com/ksopyla/svm_mnist_digit_classification.git
cd svm_mnist_digit_classification
pipenv install

Solution

In this tutorial, I use two approaches to SVM learning. First, uses classical SVM with RBF kernel. The drawback of this solution is rather long training on big datasets, although the accuracy with good parameters is high. The second, use Linear SVM, which allows for training in O(n) time. In order to achieve high accuracy, we use some trick. We approximate RBF kernel in a high dimensional space by embeddings. The theory behind is quite complicated, however sklearn has ready to use classes for kernel approximation. We will use:

Nystroem kernel approximation
Fourier kernel approximation

The code was tested with python 3.6.

How the project is organized

Project consist of three files:

mnist_helpers.py - contains some visualization functions: MNIST digits visualization and confusion matrix
svm_mnist_classification.py - script for SVM with RBF kernel classification
svm_mnist_embedings.py - script for linear SVM with embedings

SVM with RBF kernel

The svm_mnist_classification.py script downloads the MNIST database and visualizes some random digits. Next, it standardizes the data (mean=0, std=1) and launch grid search with cross-validation for finding the best parameters.

MNIST SVM kernel RBF Param search C=[0.1,0.5,1,5], gamma=[0.01,0.0.05,0.1,0.5].

Grid search was done for params C and gamma, where C=[0.1,0.5,1,5], gamma=[0.01,0.0.05,0.1,0.5]. I have examined only 4x4 different param pairs with 3 fold cross validation so far (4x4x3=48 models), this procedure takes 3687.2min :) (2 days, 13:56:42.531223 exactly) on one core CPU.

Param space was generated with numpy logspace and outer matrix multiplication.

C_range = np.outer(np.logspace(-1, 0, 2),np.array([1,5]))
# flatten matrix, change to 1D numpy array
C_range = C_range.flatten()

gamma_range = np.outer(np.logspace(-2, -1, 2),np.array([1,5]))
gamma_range = gamma_range.flatten()

Of course, you can broaden the range of parameters, but this will increase the computation time.

Grid search is very time consuming process, so you can use my best parameters (from the range c=[0.1,5], gamma=[0.01,0.05]):

C = 5
gamma = 0.05
accuracy = 0.9852

Confusion matrix:
[[1014    0    2    0    0    2    2    0    1    3]
 [   0 1177    2    1    1    0    1    0    2    1]
 [   2    2 1037    2    0    0    0    2    5    1]
 [   0    0    3 1035    0    5    0    6    6    2]
 [   0    0    1    0  957    0    1    2    0    3]
 [   1    1    0    4    1  947    4    0    5    1]
 [   2    0    1    0    2    0 1076    0    4    0]
 [   1    1    8    1    1    0    0 1110    2    4]
 [   0    4    2    4    1    6    0    1 1018    1]
 [   3    1    0    7    5    2    0    4    9  974]]
Accuracy=0.985238095238

MNIST SVM kernel RBF Param search C=[0.1,0.5,1,5, 10, 50], gamma=[0.001, 0.005, 0.01,0.0.05,0.1,0.5].

This much broaden search 6x8 params with 3 fold cross validation gives 6x8x3=144 models, this procedure takes 13024.3min (9 days, 1:33:58.999782 exactly) on one core CPU.

Best parameters:

C = 5
gamma = 0.05
accuracy = 0.9852

Linear SVM with different embeddings

Linear SVM's (SVM with linear kernels) have this advantages that there are many O(n) training algorithms. They are really fast in comparison with other nonlinear SVM (where most of them are O(n^2)). This technique is really useful if you want to train on big data.

Linear SVM algortihtms examples(papers and software):

Unfortunately, linear SVM isn't powerful enough to classify data with accuracy comparable to RBF SVM.

Learning SVM with RBF kernel could be time-consuming. In order to be more expressive, we try to approximate nonlinear kernel, map vectors into higher dimensional space explicitly and use fast linear SVM in this new space. This works extremely well!

The script svm_mnist_embedings.py presents accuracy summary and training times for full RBF kernel, linear SVC, and linear SVC with two kernel approximation Nystroem and Fourier.

Further improvements

Augmenting the training set with artificial samples
Using Randomized param search

Useful SVM MNIST learning materials

MNIST handwritten digit recognition - author compares an accuracy of a few machine learning classification algorithms (Random Forest, Stochastic Gradient Descent, Support Vector Machine, Nearest Neighbors)
Digit Recognition using OpenCV, sklearn and Python - this blog post presents using HOG features and a multiclass Linear SVM.
Grid search for RBF SVM parameters
Fast and Accurate Digit Classification- technical report - there is also download page with custom LibLinear intersection kernel
Random features for large-scale kernel machines Rahimi, A. and Recht, B. - Advances in neural information processing 2007,
Efficient additive kernels via explicit feature maps Vedaldi, A. and Zisserman, A. - Computer Vision and Pattern Recognition 2010
Generalized RBF feature maps for Efficient Detection Vempati, S. and Vedaldi, A. and Zisserman, A. and Jawahar, CV - 2010

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 226

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗