All Projects → getzze → RobustModels.jl

getzze / RobustModels.jl

Licence: MIT license
A Julia package for robust regressions using M-estimators and quantile regressions

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to RobustModels.jl

Expronicon.jl
Collective tools for metaprogramming on Julia Expr
Stars: ✭ 36 (+100%)
Mutual labels:  julia-language
YOLO.jl
YOLO Object Detection in Julia
Stars: ✭ 41 (+127.78%)
Mutual labels:  julia-language
KissABC.jl
Pure julia implementation of Multiple Affine Invariant Sampling for efficient Approximate Bayesian Computation
Stars: ✭ 28 (+55.56%)
Mutual labels:  julia-language
Acorn.jl
A pure julia text editor
Stars: ✭ 41 (+127.78%)
Mutual labels:  julia-language
deviation-network-image
Official PyTorch implementation of the paper “Explainable Deep Few-shot Anomaly Detection with Deviation Networks”, weakly/partially supervised anomaly detection, few-shot anomaly detection, image defect detection.
Stars: ✭ 47 (+161.11%)
Mutual labels:  outlier-detection
Suppressor.jl
Julia macros for suppressing and/or capturing output (STDOUT), warnings (STDERR) or both streams at the same time.
Stars: ✭ 94 (+422.22%)
Mutual labels:  julia-language
static-export-template
A template to automatically convert Pluto notebooks to an HTML website with GitHub Pages. Demo page:
Stars: ✭ 70 (+288.89%)
Mutual labels:  julia-language
pydata-london-2018
Slides and notebooks for my tutorial at PyData London 2018
Stars: ✭ 22 (+22.22%)
Mutual labels:  quantile-regression
jill.py
A cross-platform installer for the Julia programming language
Stars: ✭ 202 (+1022.22%)
Mutual labels:  julia-language
NaquadahBrowser
A web browser built in Julia Language from the ground up.
Stars: ✭ 31 (+72.22%)
Mutual labels:  julia-language
FSDA
Flexible Statistics and Data Analysis (FSDA) extends MATLAB for a robust analysis of data sets affected by different sources of heterogeneity. It is open source software licensed under the European Union Public Licence (EUPL). FSDA is a joint project by the University of Parma and the Joint Research Centre of the European Commission.
Stars: ✭ 53 (+194.44%)
Mutual labels:  robust-regression
IterTools.jl
Common functional iterator patterns
Stars: ✭ 124 (+588.89%)
Mutual labels:  julia-language
OMETIFF.jl
I/O operations for OME-TIFF files in Julia
Stars: ✭ 18 (+0%)
Mutual labels:  julia-language
Oracle.jl
Oracle Database driver for the Julia language.
Stars: ✭ 32 (+77.78%)
Mutual labels:  julia-language
LatticeQCD.jl
A native Julia code for lattice QCD with dynamical fermions in 4 dimension.
Stars: ✭ 85 (+372.22%)
Mutual labels:  julia-language
DataScienceTutorials.jl
A set of tutorials to show how to use Julia for data science (DataFrames, MLJ, ...)
Stars: ✭ 94 (+422.22%)
Mutual labels:  julia-language
Scientific-Programming-in-Julia
Repository for B0M36SPJ
Stars: ✭ 32 (+77.78%)
Mutual labels:  julia-language
DICOM.jl
Julia package for reading and writing DICOM (Digital Imaging and Communications in Medicine) files
Stars: ✭ 45 (+150%)
Mutual labels:  julia-language
C3T
C3T: Crash Course Category Theory - A friendly non-mathematician's approach to beginners of Category Theory. 🐱
Stars: ✭ 26 (+44.44%)
Mutual labels:  julia-language
XGBOD
Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"
Stars: ✭ 59 (+227.78%)
Mutual labels:  outlier-detection

Robust linear models in Julia

Documentation CI Status Coverage

This package defines robust linear models using the interfaces from StatsBase.jl and StatsModels.jl. It defines an AbstractRobustModel type as a subtype of RegressionModel and it defines the methods from the statistical model API like fit/fit!.

A robust model is a regression model, meaning it finds a relationship between one or several covariates/independent variables X and a response/dependent variable y. Contrary to the ordinary least squares estimate, robust regression mitigates the influence of outliers in the data. A standard view for RobustLinearModel is to consider that the residuals are distributed according to a contaminated distribution, that is a mixture model of the distribution of interest F and a contaminating distribution Δ with higher variance. Therefore the residuals r follow the distribution r ~ (1-ε) F + ε Δ, with 0≤ε<1.

This package implements:

  • M-estimators
  • S-estimators
  • MM-estimators
  • τ-estimators
  • MQuantile regression (e.g. Expectile regression)
  • Robust Ridge regression (using any of the previous estimator)
  • Quantile regression using interior point method

Installation

julia>] add RobustModels

To install the last development version:

julia>] add RobustModels#main

Usage

The prefered way of performing robust regression is by calling the rlm function:

m = rlm(X, y, MEstimator{TukeyLoss}(); initial_scale=:mad)

For quantile regression, use quantreg:

m = quantreg(X, y; quantile=0.5)

For robust version of mean, std, var and sem statistics, specify the estimator as first argument. Use the dims keyword for computing the statistics along specific dimensions. The following functions are also implemented: mean_and_std, mean_and_var and mean_and_sem.

xm = mean(MEstimator{HuberLoss}(), x; dims=2)

(xm, sm) = mean_and_std(TauEstimator{YohaiZamarLoss}(), x)

Examples

using RDatasets: dataset
using StatsModels
using RobustModels

data = dataset("robustbase", "Animals2")
data.logBrain = log.(data.Brain)
data.logBody = log.(data.Body)
form = @formula(logBrain ~ 1 + logBody)

## M-estimator using Tukey estimator
m1 = rlm(form, data, MEstimator{TukeyLoss}(); method=:cg, initial_scale=:mad)

## MM-estimator using Tukey estimator
m2 = rlm(form, data, MMEstimator{TukeyLoss}(); method=:cg, initial_scale=:L1)

## M-estimator using Huber estimator and correcting for covariate outliers using leverage
m3 = rlm(form, data, MEstimator{HuberLoss}(); method=:cg, initial_scale=:L1, correct_leverage=true)

## M-estimator using Huber estimator, providing an initial scale estimate and using Cholesky method of solving.
m4 = rlm(form, data, MEstimator{HuberLoss}(); method=:chol, initial_scale=15.0)

## S-estimator using Tukey estimator
m5 = rlm(form, data, SEstimator{TukeyLoss}(); σ0=:mad)

## τ-estimator using Tukey estimator
m6 = rlm(form, data, TauEstimator{TukeyLoss}(); initial_scale=:L1)

## τ-estimator using YohaiZamar estimator and resampling to find the global minimum
opts = Dict(:Npoints=>10, :Nsteps_β=>3, :Nsteps_σ=>3)
m7 = rlm(form, data, TauEstimator{YohaiZamarLoss}(); initial_scale=:L1, resample=true, resampling_options=opts)

## Expectile regression for τ=0.8
m8 = rlm(form, data, GeneralizedQuantileEstimator{L2Loss}(0.8))
#m8 = rlm(form, data, ExpectileEstimator(0.8))

## Refit with different parameters
refit!(m8; quantile=0.2)

## Robust ridge regression
m9 = rlm(form, data, MEstimator{TukeyLoss}(); initial_scale=:L1, ridgeλ=1.0)

## Quantile regression solved by linear programming interior point method
m10 = quantreg(form, data; quantile=0.2)
refit!(m10; quantile=0.8)
;

# output

Theory

M-estimators

With ordinary least square (OLS), the objective function is, from maximum likelihood estimation:

L = ½ Σᵢ (yᵢ - 𝒙ᵢ 𝜷)² = ½ Σᵢ rᵢ²

where yᵢ are the values of the response variable, 𝒙ᵢ are the covectors of individual covariates (rows of the model matrix X), 𝜷 is the vector of fitted coefficients and rᵢ are the individual residuals.

A RobustLinearModel solves instead for the following loss function: L' = Σᵢ ρ(rᵢ) (more precisely L' = Σᵢ ρ(rᵢ/σ) where σ is an (robust) estimate of the standard deviation of the residual). Several loss functions are implemented:

  • L2Loss: ρ(r) = ½ r², like ordinary OLS.
  • L1Loss: ρ(r) = |r|, non-differentiable estimator also know as Least absolute deviations. Prefer the QuantileRegression solver.
  • HuberLoss: ρ(r) = if (r<c); ½(r/c)² else |r|/c - ½ end, convex estimator that behaves as L2 cost for small residuals and L1 for large esiduals and outliers.
  • L1L2Loss: ρ(r) = √(1 + (r/c)²) - 1, smooth version of HuberLoss.
  • FairLoss: ρ(r) = |r|/c - log(1 + |r|/c), smooth version of HuberLoss.
  • LogcoshLoss: ρ(r) = log(cosh(r/c)), smooth version of HuberLoss.
  • ArctanLoss: ρ(r) = r/c * atan(r/c) - ½ log(1+(r/c)²), smooth version of HuberLoss.
  • CauchyLoss: ρ(r) = log(1+(r/c)²), non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.
  • GemanLoss: ρ(r) = ½ (r/c)²/(1 + (r/c)²), non-convex and bounded estimator, it suppresses outliers more strongly.
  • WelschLoss: ρ(r) = ½ (1 - exp(-(r/c)²)), non-convex and bounded estimator, it suppresses outliers more strongly.
  • TukeyLoss: ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end, non-convex and bounded estimator, it suppresses outliers more strongly and it is the prefered estimator for most cases.
  • YohaiZamarLoss: ρ(r) is quadratic for r/c < 2/3 and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.

The value of the tuning constants c are optimized for each estimator so the M-estimators have a high efficiency of 0.95. However, these estimators have a low breakdown point.

S-estimators

Instead of minimizing Σᵢ ρ(rᵢ/σ), S-estimation minimizes the estimate of the squared scale σ² with the constraint that: 1/n Σᵢ ρ(rᵢ/σ) = 1/2. S-estimators are only defined for bounded estimators, like TukeyLoss. These estimators have low efficiency but a high breakdown point of 1/2, by choosing the tuning constant c accordingly.

MM-estimators

It is a two-pass estimation, 1) Estimate σ using an S-estimator with high breakdown point and 2) estimate 𝜷 using an M-estimator with high efficiency. It results in an estimator with high efficiency and high breakdown point.

τ-estimators

Like MM-estimators, τ-estimators combine a high efficiency with a high breakdown point. Similar to S-estimators, it minimizes a scale estimate: τ² = σ² (2/n Σᵢρ₂(rᵢ/σ)) where σ is an M-scale estimate, solution of 1/n Σᵢ ρ₁(rᵢ/σ) = 1/2. Finding the minimum of a τ-estimator is similar to the procedure for an S-estimator with a special weight function that combines both functions ρ₁ and ρ₂. To ensure a high breakdown point and high efficiency, the two loss functions should be the same but with different tuning constants.

MQuantile-estimators

Using an asymmetric variant of the L1Estimator, quantile regression is performed (although the QuantileRegression solver should be prefered because it gives an exact solution). Identically, with an M-estimator using an asymetric version of the loss function, a generalization of quantiles is obtained. For instance, using an asymetric L2Loss results in Expectile Regression.

Robust Ridge regression

This is the robust version of the ridge regression, adding a penalty on the coefficients. The objective function to solve is L = Σᵢ ρ(rᵢ/σ) + λ Σⱼ|βⱼ|², where the sum of squares of coefficients does not include the intersect β₀. Robust ridge regression is implemented for all the estimators (not for quantreg). By default, all the coefficients (except the intercept) have the same penalty, which assumes that all the feature variables have the same scale. If it is not the case, use a robust estimate of scale to normalize every column of the model matrix X before fitting the regression.

Quantile regression

Quantile regression results from minimizing the following objective function: L = Σᵢ wᵢ|yᵢ - 𝒙ᵢ 𝜷| = Σᵢ wᵢ(rᵢ) |rᵢ|, where wᵢ = ifelse(rᵢ>0, τ, 1-τ) and τ is the quantile of interest. τ=0.5 corresponds to Least Absolute Deviations.

This problem is solved exactly using linear programming techniques, specifically, interior point methods using the internal API of Tulip. The internal API is considered unstable, but it results in a much lighter dependency than including the JuMP package with Tulip backend.

Credits

This package derives from the RobustLeastSquares package for the initial implementation, especially for the Conjugate Gradient solver and the definition of the M-Estimator functions.

Credits to the developpers of the GLM and MixedModels packages for implementing the Iteratively Reweighted Least Square algorithm.

References

  • "Robust Statistics: Theory and Methods (with R)", 2nd Edition, 2019, R. Maronna, R. Martin, V. Yohai, M. Salibián-Barrera
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].