All Projects → tiskw → random-fourier-features

tiskw / random-fourier-features

Licence: MIT license
Implementation of random Fourier features for kernel method, like support vector machine and Gaussian process model

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects
Makefile
30231 projects

Projects that are alternatives of or similar to random-fourier-features

Handwritten-Digits-Classification-Using-KNN-Multiclass Perceptron-SVM
🏆 A Comparative Study on Handwritten Digits Recognition using Classifiers like K-Nearest Neighbours (K-NN), Multiclass Perceptron/Artificial Neural Network (ANN) and Support Vector Machine (SVM) discussing the pros and cons of each algorithm and providing the comparison results in terms of accuracy and efficiecy of each algorithm.
Stars: ✭ 42 (-16%)
Mutual labels:  machine-learning-algorithms, support-vector-machines
ml course
"Learning Machine Learning" Course, Bogotá, Colombia 2019 #LML2019
Stars: ✭ 22 (-56%)
Mutual labels:  machine-learning-algorithms, gaussian-processes
100 Days Of Ml Code
100 Days of ML Coding
Stars: ✭ 33,641 (+67182%)
Mutual labels:  machine-learning-algorithms, support-vector-machines
R-stats-machine-learning
Misc Statistics and Machine Learning codes in R
Stars: ✭ 33 (-34%)
Mutual labels:  support-vector-machines, principal-component-analysis
Dynaml
Scala Library/REPL for Machine Learning Research
Stars: ✭ 195 (+290%)
Mutual labels:  machine-learning-algorithms, gaussian-processes
ClassifierToolbox
A MATLAB toolbox for classifier: Version 1.0.7
Stars: ✭ 72 (+44%)
Mutual labels:  support-vector-machines, principal-component-analysis
PyLDA
A Latent Dirichlet Allocation implementation in Python.
Stars: ✭ 51 (+2%)
Mutual labels:  machine-learning-algorithms
pyHSICLasso
Versatile Nonlinear Feature Selection Algorithm for High-dimensional Data
Stars: ✭ 125 (+150%)
Mutual labels:  machine-learning-algorithms
AgePredictor
Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
Stars: ✭ 13 (-74%)
Mutual labels:  machine-learning-algorithms
ml-ai
ML-AI Community | Open Source | Built in Bharat for the World | Data science problem statements and solutions
Stars: ✭ 32 (-36%)
Mutual labels:  machine-learning-algorithms
Learn-Machine-Learning-in-3-month
No description or website provided.
Stars: ✭ 35 (-30%)
Mutual labels:  machine-learning-algorithms
spams-python
A rehost of the python version of SPArse Modeling Software (SPAMS)
Stars: ✭ 28 (-44%)
Mutual labels:  machine-learning-algorithms
Tf-Rec
Tf-Rec is a python💻 package for building⚒ Recommender Systems. It is built on top of Keras and Tensorflow 2 to utilize GPU Acceleration during training.
Stars: ✭ 18 (-64%)
Mutual labels:  machine-learning-algorithms
lbfgsb-gpu
An open source library for the GPU-implementation of L-BFGS-B algorithm
Stars: ✭ 70 (+40%)
Mutual labels:  machine-learning-algorithms
EXPLORING SKLEARN
Exploring sklearn 🌟
Stars: ✭ 12 (-76%)
Mutual labels:  machine-learning-algorithms
Machine-Learning-Explained
Learn the theory, math and code behind different machine learning algorithms and techniques.
Stars: ✭ 30 (-40%)
Mutual labels:  machine-learning-algorithms
machine-learning-notebooks
🤖 An authorial collection of fundamental python recipes on Machine Learning and Artificial Intelligence.
Stars: ✭ 63 (+26%)
Mutual labels:  machine-learning-algorithms
bitcoin-prediction
bitcoin prediction algorithms
Stars: ✭ 21 (-58%)
Mutual labels:  machine-learning-algorithms
TrendinessOfTrends
The Trendiness of Trends
Stars: ✭ 14 (-72%)
Mutual labels:  gaussian-processes
SGDLibrary
MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+230%)
Mutual labels:  machine-learning-algorithms

Random Fourier Features

This repository provides Python module rfflearn which is a library of random Fourier features [1, 2] for kernel method, like support vector machine and Gaussian process model. Features of this module are:

  • interfaces of the module are quite close to the scikit-learn,
  • support vector classifier and Gaussian process regressor/classifier provides CPU/GPU training and inference,
  • interface to optuna for easier hyper parameter tuning,
  • this repository provides example code that shows RFF is useful for actual machine learning tasks.

Now, this module supports the following methods:

Method CPU support GPU support
canonical correlation analysis rfflearn.cpu.RFFCCA -
Gaussian process regression rfflearn.cpu.RFFGPR rfflearn.gpu.RFFGPR
Gaussian process classification rfflearn.cpu.RFFGPC rfflearn.gpu.RFFGPC
principal component analysis rfflearn.cpu.RFFPCA rfflearn.gpu.RFFPCA
regression rfflearn.cpu.RFFRegression -
support vector classification rfflearn.cpu.RFFSVC rfflearn.gpu.RFFSVC
support vector regression rfflearn.cpu.RFFSVR -

RFF can be applicable for many other machine learning algorithms, I will provide other functions soon.

Minimal example

Interfaces provided by our module is quite close to scikit-learn. For example, the following Python code is a sample usage of RFFSVC (support vector machine with random Fourier features) class.

>>> import numpy as np
>>> import rfflearn.cpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> svc = rfflearn.RFFSVC().fit(X, y)                   # Training (on CPU)
>>> svc.score(X, y)                                     # Inference (on CPU)
1.0
>>> svc.predict(np.array([[-0.8, -1]]))
array([1])

This module supports training/inference on GPU. For example, the following Python code is a sample usage of RFFGPC (Gaussian process classifier with random Fourier features) on GPU. The following code requires PyTorch (>= 1.7.0).

>>> import numpy as np
>>> import rfflearn.gpu as rfflearn                     # Import module
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])  # Define input data
>>> y = np.array([1, 1, 2, 2])                          # Defile label data
>>> gpc = rfflearn.RFFGPC().fit(X, y)                   # Training on GPU
>>> gpc.score(X, y)                                     # Inference on GPU
1.0
>>> gpc.predict(np.array([[-0.8, -1]]))
array([1])

See examples directory for more detailed examples.

Example1: MNIST using random Fourier features

I tried SVC (support vector classifier) and GPC (Gaussian process classifire) with RFF to the MNIST dataset which is one of the famous benchmark dataset on the image classification task, and I've got better performance and much faster inference speed than kernel SVM. The following table gives a brief comparison of kernel SVM, SVM with RFF and GPC with RFF. See the example of RFF SVC module and RFF GP module for mode details.

Method RFF dimension Inference time (us) Score (%)
Kernel SVM - 4644.9 us 96.3 %
RFF SVC 512 39.0 us 96.5 %
RFF SVC 1024 96.1 us 97.5 %
RFF SVC (GPU) 1024 2.38 us 97.5 %
RFF GPC 5120 342.1 us 98.2 %
RFF GPC (GPU) 5120 115.0 us 98.2 %
Accuracy for each epochs in RFF SVC/GPC

Example2: Visualization of feature importance

This module also have interfaces to some feature importance methods, like SHAP [3] and permutation importance [4]. I tried SHAP and permutation importance to RFFGPR trained by Boston house-price dataset, and the following is the visualization results obtained by rfflearn.shap_feature_importance and rfflearn.permutation_feature_importance.

Permutation importances of Boston housing dataset SHAP importances of Boston housing dataset

Requirements and installation

The author recommend to use docker image for building environment, however, of course, you can install necessary packages on your environment. See SETUP.md for more details.

Notes

  • Name of this module is changed from pyrff to rfflearn on Oct 2020, because the package pyrff already exists in PyPI.
  • If a number of training data are huge, error message like RuntimeError: The task could not be sent to the workers as it is too large for 'send_bytes' will be raised from the joblib library. The reason for this error is that the sklearn.svm.LinearSVC uses joblib as a multiprocessing backend, but joblib cannot deal huge size of the array which cannot be managed with 32-bit address space. In this case, please try n_jobs = 1 option for the RFFSVC or ORFSVC function. Default settings are n_jobs = -1 which means automatically detecting available CPUs and using them. (This bug information was reported by Mr. Katsuya Terahata @ Toyota Research Institute Advanced Development. Thank you so much for the reporting!)
  • Applucation of RFF to the Gaussian process is not straight forward. See this document for mathematical details.

Licence

MIT Licence

Reference

[1] A. Rahimi and B. Recht, "Random Features for Large-Scale Kernel Machines", NIPS, 2007. PDF

[2] F. X. Yu, A. T. Suresh, K. Choromanski, D. Holtmann-Rice and S. Kumar, "Orthogonal Random Features", NIPS, 2016. PDF

[3] S. M. Lundberg and S. Lee, "A Unified Approach to Interpreting Model Predictions", NIPS, 2017. PDF

[4] L. Breiman, "Random Forests", Machine Learning, vol. 45, pp. 5-32, Springer, 2001. Springer website.

Author

Tetsuya Ishikawa (EMail, Website)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].