All Projects → gbuesing → pca

gbuesing / pca

Licence: MIT license
Principal component analysis (PCA) in Ruby

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to pca

NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (+56%)
Mutual labels:  pca, principal-component-analysis
xmca
Maximum Covariance Analysis in Python
Stars: ✭ 41 (+64%)
Mutual labels:  pca, principal-component-analysis
ClassifierToolbox
A MATLAB toolbox for classifier: Version 1.0.7
Stars: ✭ 72 (+188%)
Mutual labels:  pca, principal-component-analysis
playing with vae
Comparing FC VAE / FCN VAE / PCA / UMAP on MNIST / FMNIST
Stars: ✭ 53 (+112%)
Mutual labels:  pca
AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (+56%)
Mutual labels:  pca
zAnalysis
zAnalysis是基于Pascal语言编写的大型统计学开源库
Stars: ✭ 52 (+108%)
Mutual labels:  pca
abess
Fast Best-Subset Selection Library
Stars: ✭ 266 (+964%)
Mutual labels:  principal-component-analysis
moses
Streaming, Memory-Limited, r-truncated SVD Revisited!
Stars: ✭ 19 (-24%)
Mutual labels:  pca
lin-im2im
Linear image-to-image translation
Stars: ✭ 39 (+56%)
Mutual labels:  pca
SpatPCA
R Package: Regularized Principal Component Analysis for Spatial Data
Stars: ✭ 16 (-36%)
Mutual labels:  pca
PerfSpect
system performance characterization tool based on linux perf
Stars: ✭ 45 (+80%)
Mutual labels:  pca
data-science-learning
📊 All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.
Stars: ✭ 32 (+28%)
Mutual labels:  pca
supervised-random-projections
Python implementation of supervised PCA, supervised random projections, and their kernel counterparts.
Stars: ✭ 19 (-24%)
Mutual labels:  pca
SNPRelate
R package: parallel computing toolset for relatedness and principal component analysis of SNP data (Development Version)
Stars: ✭ 74 (+196%)
Mutual labels:  pca
osm-data-classification
Migrated to: https://gitlab.com/Oslandia/osm-data-classification
Stars: ✭ 23 (-8%)
Mutual labels:  pca
geeSharp.js
Pan-sharpening in the Earth Engine code editor
Stars: ✭ 25 (+0%)
Mutual labels:  pca
Reconstruction-and-Compression-of-Color-Images
Reconstruction and Compression of Color Images Using Principal Component Analysis (PCA) Algorithm
Stars: ✭ 30 (+20%)
Mutual labels:  principal-component-analysis
info-retrieval
Information Retrieval in High Dimensional Data (class deliverables)
Stars: ✭ 33 (+32%)
Mutual labels:  principal-component-analysis
MachineLearning
Implementations of machine learning algorithm by Python 3
Stars: ✭ 16 (-36%)
Mutual labels:  pca
random-fourier-features
Implementation of random Fourier features for kernel method, like support vector machine and Gaussian process model
Stars: ✭ 50 (+100%)
Mutual labels:  principal-component-analysis

Principal Component Analysis (PCA)

Principal component analysis in Ruby. Uses GSL for calculations.

PCA can be used to map data to a lower dimensional space while minimizing information loss. It's useful for data visualization, where you're limited to 2-D and 3-D plots.

For example, here's a plot of the 4-D iris flower dataset mapped to 2-D via PCA:

iris

PCA is also used to compress the features of a dataset before feeding it into a machine learning algorithm, potentially speeding up training time with a minimal loss of data detail.

Install

GSL must be installed first. On OS X it can be installed via homebrew: brew install gsl

gem install pca

Example Usage

require 'pca'

pca = PCA.new components: 1

data_2d = [ 
  [2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0],
  [2.3, 2.7], [2.0, 1.6], [1.0, 1.1], [1.5, 1.6], [1.1, 0.9]
]

data_1d = pca.fit_transform data_2d

# Transforms 2d data into 1d:
# data_1d ~= [
#   [-0.8], [1.8], [-1.0], [-0.3], [-1.7],
#   [-0.9], [0.1], [1.1], [0.4], [1.2]
# ]

more_data_1d = pca.transform [ [3.1, 2.9] ]

# Transforms new data into previously fitted 1d space:
# more_data_1d ~= [ [-1.6] ]

reconstructed_2d = pca.inverse_transform data_1d

# Reconstructs original data (approximate, b/c data compression):
# reconstructed_2d ~= [
#   [2.4, 2.5], [0.6, 0.6], [2.5, 2.6], [2.0, 2.1], [2.9, 3.1]
#   [2.4, 2.6], [1.7, 1.8], [1.0, 1.1], [1.5, 1.6], [1.0, 1.0]
# ]

evr = pca.explained_variance_ratio

# Proportion of data variance explained by each component
# Here, the first component explains 99.85% of the data variance:
# evr ~= [0.99854]

See examples for more. Also, peruse the source code (~ 100 loc.)

Options

The following options can be passed in to PCA.new:

option default description
:components nil number of components to extract. If nil, will just rotate data onto first principal component
:scale_data false scales features before running PCA by dividing each feature by its standard deviation.

Working with Returned GSL::Matrix

PCA#transform, #fit_transform, #inverse_transform and #components return instances of GSL::Matrix.

Some useful methods to work with these are the #each_row and #each_col iterators, and the #row(i) and #col(i) accessors.

Or if you'd prefer to work with a standard Ruby Array, you can just call #to_a and get an array of row arrays.

See GSL::Matrix RDoc for more.

Plotting Results With GNUPlot

Requires GNUPlot and gnuplot gem.

require 'pca'
require 'gnuplot'

pca = PCA.new components: 2
data_2d = pca.fit_transform data

Gnuplot.open do |gp|
  Gnuplot::Plot.new(gp) do |plot|
    plot.title "Transformed Data"
    plot.terminal "png"
    plot.output "out.png"

    # Use #col accessor to get separate x and y arrays
    # #col returns a GSL::Vector, so be sure to call #to_a before passing to DataSet
    xy = [data_2d.col(0).to_a, data_2d.col(1).to_a]

    plot.data << Gnuplot::DataSet.new(xy) do |ds|
      ds.title = "Points"
    end
  end
end

Sources and Inspirations

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].