All Projects → baggepinnen → TotalLeastSquares.jl

baggepinnen / TotalLeastSquares.jl

Licence: MIT license
Solve many kinds of least-squares and matrix-recovery problems

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to TotalLeastSquares.jl

missCompare
missCompare R package - intuitive missing data imputation framework
Stars: ✭ 31 (+34.78%)
Mutual labels:  imputation, missing-data, missing-data-imputation
FSDA
Flexible Statistics and Data Analysis (FSDA) extends MATLAB for a robust analysis of data sets affected by different sources of heterogeneity. It is open source software licensed under the European Union Public Licence (EUPL). FSDA is a joint project by the University of Parma and the Joint Research Centre of the European Commission.
Stars: ✭ 53 (+130.43%)
Mutual labels:  robust-estimation, robust-statistics
ALRA
Imputation method for scRNA-seq based on low-rank approximation
Stars: ✭ 48 (+108.7%)
Mutual labels:  imputation, matrix-completion
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (+117.39%)
Mutual labels:  linear-regression, matrix-completion
py ml utils
Python utilities for Machine Learning competitions
Stars: ✭ 29 (+26.09%)
Mutual labels:  missing-data
outlier-detection-grubbs-test-and-generalized-esd-test-python
Repository for detecting outliers using Grubb's Threshold & Generalized Extreme Studentized Deviate (ESD) Test
Stars: ✭ 20 (-13.04%)
Mutual labels:  outlier-detection
prodest
Stata and R functions for production function estimation
Stars: ✭ 24 (+4.35%)
Mutual labels:  estimation
RobustModels.jl
A Julia package for robust regressions using M-estimators and quantile regressions
Stars: ✭ 18 (-21.74%)
Mutual labels:  outlier-detection
deviation-network
Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection
Stars: ✭ 94 (+308.7%)
Mutual labels:  outlier-detection
BasisFunctionExpansions.jl
Basis Function Expansions for Julia
Stars: ✭ 19 (-17.39%)
Mutual labels:  linear-regression
brglm2
Estimation and inference from generalized linear models using explicit and implicit methods for bias reduction
Stars: ✭ 18 (-21.74%)
Mutual labels:  estimation
Undergraduate-in-Statistics
Using Computer with your Statistics Major Course
Stars: ✭ 57 (+147.83%)
Mutual labels:  linear-regression
prescription-outliers
DDC-Outlier: Preventing medication errors using unsupervised learning
Stars: ✭ 18 (-21.74%)
Mutual labels:  outlier-detection
code-review-estimator
Estimate cost of code review using Machine Learning
Stars: ✭ 49 (+113.04%)
Mutual labels:  estimation
machine learning in python
Demo of basic machine learning models in python with Jupter Notebook
Stars: ✭ 16 (-30.43%)
Mutual labels:  linear-regression
machine learning
A gentle introduction to machine learning: data handling, linear regression, naive bayes, clustering
Stars: ✭ 22 (-4.35%)
Mutual labels:  linear-regression
spark-lof
A parallel implementation of local outlier factor based on Spark
Stars: ✭ 16 (-30.43%)
Mutual labels:  outlier-detection
SGDLibrary
MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20
Stars: ✭ 165 (+617.39%)
Mutual labels:  linear-regression
GEstimator
GEstimator is a simple civil estimation software written in Python and GTK+. GEstimator can prepare estimates along with rate analysis and supports multiple databases.
Stars: ✭ 17 (-26.09%)
Mutual labels:  estimation
Learned-Turbo-type-Affine-Rank-Minimization
Code for Learned Turbo-ype Affine Rank Minimization
Stars: ✭ 4 (-82.61%)
Mutual labels:  matrix-completion

CI codecov

TotalLeastSquares.jl

Solve (weighted or robust) total least-squares problems, impute missing data and robustly filter time series.

These functions are exported:

Estimation

  • x = tls(A,y) Solves the standard TLS problem using the SVD method. An inplace version tls!(Ay, n) also exists, for this you need to supply Ay = [A y] and the width of A, n = size(A,2).
  • x = wtls(A,y,Qaa,Qay,Qyy,iters=10) Solves the weighted TLS problem using algorithm 1 from (Fang, 2013) The Q-matrices are the covariance matrices of the noise terms in vec(A) and y respectively.
  • Qaa,Qay,Qyy = rowcovariance(rowQ::AbstractVector{<:AbstractMatrix}) Takes row-wise covariance matrices QAy[i] and returns the full (sparse) covariance matrices. rowQ = [cov([A[i,:] y[i]]) for i = 1:length(y)]
  • x = wls(A,y,Qyy) Solves the weighted standard LS problem. Qyy is the covariance matrix of the residuals with side length equal to the length of y.
  • x = rtls(A,y) Solves a robust TLS problem. Both A and y are assumed to be corrupted with high magnitude, but sparse, noise. See analysis below.
  • x = irls(A,y; tolx=0.001, tol=1.0e-6, verbose=false, iters=100) minimizeₓ ||Ax-y||₁ using iteratively reweighted least squares.
  • x = sls(A,y; r = 1, iters = 100, verbose = false, tol = 1.0e-8) Simplex least-squares: minimizeₓ ||Ax-y||₂ s.t. sum(x) = r
  • x = flts(A,y; outlier = 0.5, N = 500, maxiter = 100, return_set = false, verbose = true) Fast least trimmed squares: Minimizing the sum of squared residuals by finding an outlier free subset among N initial subsets. Robust up to 50 % outlier.

Matrix recovery

  • Â, Ê, s, sv = rpca(D; kwargs...) robust matrix recovery using robust PCA. Solves minimize_{A,E} ||A||_* + λ||E||₁ s.t. D = A+E. Optionally force A or E to be non-negative.
  • Q = rpca_ga(X, r; kwargs...) robust PCA using Grassmann averages. Returns the principal components up to rank r.

Time-series filtering

  • yf = lowrankfilter(y, n; kwargs...) Filter time series y by forming a lag-embedding H of length n (a Hankel matrix) and using rpca to recover a low-rank matrix from which the a filtered signal yf can be extracted. Robustly filters large sparse outliers. See example notebook for more info.

Example

using TotalLeastSquares, FillArrays, Random, Printf
Random.seed!(0)
x   = randn(3)
A   = randn(50,3)
σa  = 1
σy  = 0.01
An  = A + σa*randn(size(A))
y   = A*x
yn  = y + σy*randn(size(y))
Qaa = σa^2*Eye(prod(size(A)))
Qay = 0Eye(prod(size(A)),length(y))
Qyy = σy^2*Eye(prod(size(y)))


x̂ = An\yn
@printf "Least squares error: %25.3e %10.3e %10.3e, Norm: %10.3e\n" (x-x̂)... norm(x-x̂)

x̂ = wls(An,yn,Qyy)
@printf "Weighted Least squares error: %16.3e %10.3e %10.3e, Norm: %10.3e\n" (x-x̂)... norm(x-x̂)

x̂ = tls(An,yn)
@printf "Total Least squares error: %19.3e %10.3e %10.3e, Norm: %10.3e\n" (x-x̂)... norm(x-x̂)

x̂ = wtls(An,yn,Qaa,Qay,Qyy,iters=10)
@printf "Weighted Total Least squares error: %10.3e %10.3e %10.3e, Norm: %10.3e\n" (x-x̂)... norm(x-x̂)
println("----------------------------")
Least squares error:                 3.753e-01  2.530e-01 -3.637e-01, Norm:  5.806e-01
Weighted Least squares error:        3.753e-01  2.530e-01 -3.637e-01, Norm:  5.806e-01
Total Least squares error:           2.897e-01  1.062e-01 -2.976e-01, Norm:  4.287e-01
Weighted Total Least squares error:  1.213e-01 -1.933e-01 -9.527e-02, Norm:  2.473e-01
----------------------------

Robust TLS analysis

The code for this analysis is here.

We generate random data on the form Ax=y where both A and y are corrupted with sparse noise, the entries in A are Gaussian random variables with unit variance and size(A) = (500,5). The plots below show the norm of the error in the estimated x as functions of the noise variance and the noise sparsity.

window window window

The results indicate that the robust method is to be preferred when the noise is large but sparse.

Missing data imputation

The robust methods handle missing data the same way as they handle outliers. You may indicate that an entry is missing simply by setting it to a very large value, e.g.,

N    = 500
y    = sin.(0.1 .* (1:N))   # Sinus
miss = rand(N) .< 0.1       # 10% missing values
yn   = y .+ miss .* 1e2 .+ 0.1*randn(N) # Set missing values to very large number and add noise
yf   = lowrankfilter(yn,40) # Filter
mean(abs2,y-yf)/mean(abs2,y) # Normalized error
# 0.001500 # Less than 1 percent error in the recovery of y

To impute missing data in a matrix, we make use of rpca:

H     = hankel(sin.(0.1 .* (1:N)), 5)         # A low-rank matrix
miss  = rand(size(H)...) .< 0.1               # 10% missing values
Hn    = H .+ 0.1randn(size(H)) .+ miss .* 1e2 # Set missing values to very large number
Ĥ, E  = rpca(Hn)
mean(abs2,H-Ĥ)/mean(abs2,H) # Normalized error
# 0.06 # Six percent error in the recovery of H

The matrix E contains the estimated outliers

vec(E)'vec(miss)/(norm(E)*norm(miss)) # These should correlate if all missing values were identified
# 1.00

Speeding up robust factorization

The function rpca internally performs several SVDs, which make up the bulk of the computation time. In order to speed this up, you may provide a custom svd function. An example using a randomized method from RandomizedLinAlg.jl:

using TotalLeastSquares, RandomizedLinAlg
lowrankfilter(x, L, svd = rsvd_fnkz, opnorm=x->rnorm(x,10)) # The same keywords are accepted by rpca

here, we provide both a randomized svd function as well as one for calculating the operator norm, which also takes a long time. Other alternative svd functions to consider are

  • RandomizedLinAlg.rsvd
  • TSVD.tsvd

Notes

This package was developed for the thesis
Bagge Carlson, F., "Machine Learning and System Identification for Estimation in Physical Systems" (PhD Thesis 2018).

@thesis{bagge2018,
  title        = {Machine Learning and System Identification for Estimation in Physical Systems},
  author       = {Bagge Carlson, Fredrik},
  keyword      = {Machine Learning,System Identification,Robotics,Spectral estimation,Calibration,State estimation},
  month        = {12},
  type         = {PhD Thesis},
  number       = {TFRT-1122},
  institution  = {Dept. Automatic Control, Lund University, Sweden},
  year         = {2018},
  url          = {https://lup.lub.lu.se/search/publication/ffb8dc85-ce12-4f75-8f2b-0881e492f6c0},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].