All Projects → pachadotdev → eflm

pachadotdev / eflm

Licence: other
Efficient Fitting of Linear and Generalized Linear Models by using just base R. The speed gains over lm and glm are obtained by reducing the NxP model matrix to a PxP matrix, and the best computational performance is obtained when R is linked against OpenBLAS, Intel MKL or other optimized BLAS library.

Programming Languages

r
7636 projects
Stata
111 projects

Projects that are alternatives of or similar to eflm

gravity
R package that provides estimation methods for Gravity Models
Stars: ✭ 24 (+71.43%)
Mutual labels:  lm, glm
ARehab
ARehab is a free software for physical rehab, aided by augmented reality technologies and user tracking systems, such as Microsoft Kinect v2.
Stars: ✭ 20 (+42.86%)
Mutual labels:  glm
python-arpa
🐍 Python library for n-gram models in ARPA format
Stars: ✭ 35 (+150%)
Mutual labels:  lm
GLM
Code for the General Lake Model
Stars: ✭ 30 (+114.29%)
Mutual labels:  glm
PyGLM
Fast OpenGL Mathematics (GLM) for Python
Stars: ✭ 167 (+1092.86%)
Mutual labels:  glm
KoLM
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
Stars: ✭ 46 (+228.57%)
Mutual labels:  lm
php-ntlm
Message encoder/decoder and password hasher for the NTLM authentication protocol
Stars: ✭ 14 (+0%)
Mutual labels:  lm
sph-tutorial
Brandon Pelfrey's SPH fluid simulation tutorial
Stars: ✭ 36 (+157.14%)
Mutual labels:  glm
embeddings
Embeddings: State-of-the-art Text Representations for Natural Language Processing tasks, an initial version of library focus on the Polish Language
Stars: ✭ 27 (+92.86%)
Mutual labels:  lm
enmSdm
Faster, better, smarter ecological niche modeling and species distribution modeling
Stars: ✭ 39 (+178.57%)
Mutual labels:  glm
ncem
Learning cell communication from spatial graphs of cells
Stars: ✭ 77 (+450%)
Mutual labels:  glm
Perlin-Noise-3D-Voxel-Generator
Voxel generator based on perlin 3d noise | Python OpenGL
Stars: ✭ 22 (+57.14%)
Mutual labels:  glm
goes2go
Download and process GOES-16 and GOES-17 data from NOAA's archive on AWS using Python.
Stars: ✭ 77 (+450%)
Mutual labels:  glm
Lingvo
Lingvo
Stars: ✭ 2,361 (+16764.29%)
Mutual labels:  lm
glm
OpenGL Mathematics (GLM)
Stars: ✭ 6,667 (+47521.43%)
Mutual labels:  glm
bangla-bert
Bangla-Bert is a pretrained bert model for Bengali language
Stars: ✭ 41 (+192.86%)
Mutual labels:  lm
vulkan-seed
🌋🌱 A Vulkan starter repo that you could use to get the ball rolling.
Stars: ✭ 57 (+307.14%)
Mutual labels:  glm
lm-scorer
📃Language Model based sentences scoring library
Stars: ✭ 264 (+1785.71%)
Mutual labels:  lm
brglm2
Estimation and inference from generalized linear models using explicit and implicit methods for bias reduction
Stars: ✭ 18 (+28.57%)
Mutual labels:  glm
glfw-skeleton
💀 A skeleton OpenGL C++ app bootstrapped with glfw, glad, and glm.
Stars: ✭ 24 (+71.43%)
Mutual labels:  glm

Efficient Fitting of Linear Models

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Lifecycle: stable CRAN status codecov R-CMD-check

Scope

eflm package reduces the design matrix from N × P into P × P for reduced fitting time, and delivers functions that are drop-in replacements for glm and lm, like:

# just append and 'e' to glm
eglm(mpg ~ wt, data = mtcars)

The best computational performance is obtained when R is linked against OpenBLAS, Intel MKL or other optimized BLAS library. This implementation aims at being compatible with ‘broom’ and ‘sandwich’ packages for summary statistics and clustering by providing S3 methods.

This package takes ideas from glm2, speedglm, fastglm, speedglm and fixest packages, but the implementations here shall keep the functions and outputs as closely as possible to the stats package, therefore making the functions provided here compatible with packages such as sandwich for robust estimation, even if that means to attenuate the speed gains.

The greatest strength of this package is testing. With more than 1600 (and counting) tests, we try to do exactly the same as lm/glm, even in edge cases, but faster.

The ultimate aim of the project is to produce a package that:

  • Does exactly the same as lm and glm in less time
  • Is equally numerically stable as lm and glm
  • Depends only on base R, with no Rcpp or other calls
  • Uses R’s internal C code such as the Cdqrls function that the stats package uses for model fitting
  • Can be used in Shiny dashboard and contexts where you need fast model fitting
  • Is useful for memory consuming models
  • Allows model fitting in cases demanding more memory than free RAM (PENDING)

Installation

You can install the released version of eflm from CRAN with:

install.packages("eflm")

And the development version with:

remotes::install_github("pachadotdev/eflm")

Progress list

Stats compatibility

  • cooks.distance

Sandwich compatibility

  • estfun
  • bread
  • vcovCL
  • meatCL
  • vcovCL
  • vcovBS
  • vcovHC
  • meatHC
  • vcovPC
  • meatPC
  • vcovPL
  • meatPL

Broom compatibility

  • augment
  • tidy
  • glance

Lmtest compatibility

  • resettest

Benchmarking

The dataset for this benchmark was taken from Yotov et al. (2016) and consists in a 28,152 x 8 data frame with 6 numeric and 2 categorical columns of the form:

Year (t) Trade (X) DIST CNTG LANG CLNY Exp Year (π) Imp Year (χ)
1986 27.8 12045 0 0 0 ARG1986 AUS1986
1986 3.56 11751 0 0 0 ARG1986 AUT1986
1986 96.1 11305 0 0 0 ARG1986 BEL1986

This data can be found in the tradepolicy package.

The variables are:

  • year: time of export/import flow
  • trade: bilateral trade
  • log_dist: log of distance
  • cntg: contiguity (0/1)
  • lang: common language (0/1)
  • clny: colonial relation (0/1)
  • exp_year/imp_year: exporter/importer time fixed effects

For benchmarking I’ll fit a PPML model, as it’s a computationally expensive model.

ch1_application1 <- tradepolicy::agtpa_applications %>%
  select(exporter, importer, pair_id, year, trade, dist, cntg, lang, clny) %>%
  filter(year %in% seq(1986, 2006, 4))
  
formula <- trade ~ log(dist) + cntg + lang + clny + exp_year + imp_year
eglm(formula, quasipoisson, ch1_application1)

To compare glm, the proposed eglm and Stata’s ppml, I conducted a test with 500 repetitions locally, and reported the median of the realizations as the fitting time. The plots on the right report the fitting times and used memory by running regressions with cumulative subset of the data for 1986, …, 2006 (e.g. regress for 1986, then 1986 and 1990, …, then 1986 to 2006), we obtain the next fitting times and memory allocation depending on the design matrix dimensions:

Yotov et al. (2016) features complex both partial and general equilibrium models. Some partial equilibrium models are particularly slow to fit because of the allocated memory and the number of fixed effects, such as the Regional Trade Agreements (RTAs) model.

In the next table, TG means ‘Traditional Gravity’ (e.g. vanilla PPML), DP means ‘Distance Puzzle’ and GB stands for ‘Globalization’, which are refinements of the simple PPML model and include dummy variables such as specific country pair fixed effects and lagged RTAs.

Model Rows in design matrix Cols in design matrix
TG, PPML 28152 831
DP, FE 28566 905
RTAs, GB 28482 3175

The results for the RTA model show that the speedups can be scaled, and we can show both time reduction and required memory increases.

Model GLM Time (s) EGLM Time (s) Time Gain (%)
DP, FE 111.0 9.08 91.82%
RTAs, GB 1824.0 161.40 91.15%
TG, PPML 108.6 9.06 91.66%

Is it important to mention that the increase in memory results in reduced object size for the stored model.

Model GLM Size (MB) EGLM Size (MB) Memory Savings (%)
DP, FE 231.04 37.26 83.87%
RTAs, GB 824.89 263.36 68.07%
TG, PPML 210.88 34.69 83.55%

To conclude my benchmarks, I fitted the PPML model again on DigitalOcean droplets, leading to consistent times across scaled hardware. The results can be seen in the next plot:

Edge cases

An elementary example that breaks eflm even with QR decomposition can be found in Golub et al. (2013), which consists in passing an ill conditioned matrix:

Model (Intercept) x1 x2
REG 1 1.98 2.98 1.02
REG 2 1.98 4.00 NA

References

Golub, Gene H, and Charles F Van Loan. 2013. Matrix Computations. Vol. 3. JHU press.

Yotov, Yoto V, Roberta Piermartini, José-Antonio Monteiro, and Mario Larch. 2016. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. World Trade Organization Geneva.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].