All Projects → abess-team → abess

abess-team / abess

Licence: other
Fast Best-Subset Selection Library

Programming Languages

C++
36643 projects - #6 most used programming language
r
7636 projects
python
139335 projects - #7 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to abess

100 Days Of Ml Code
100 Days of ML Coding
Stars: ✭ 33,641 (+12546.99%)
Mutual labels:  linear-regression, scikit-learn
MachineLearningSeries
Vídeos e códigos do Universo Discreto ensinando o fundamental de Machine Learning em Python. Para mais detalhes, acompanhar a playlist listada.
Stars: ✭ 20 (-92.48%)
Mutual labels:  linear-regression, classification-algorithm
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-93.61%)
Mutual labels:  linear-regression, scikit-learn
ClassifierToolbox
A MATLAB toolbox for classifier: Version 1.0.7
Stars: ✭ 72 (-72.93%)
Mutual labels:  linear-regression, principal-component-analysis
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+469.92%)
Mutual labels:  linear-regression, scikit-learn
Data Science Complete Tutorial
For extensive instructor led learning
Stars: ✭ 1,027 (+286.09%)
Mutual labels:  linear-regression, scikit-learn
Lecture-3-Linear-Models
ICDSS Machine Learning Workshop Series: Linear Models
Stars: ✭ 19 (-92.86%)
Mutual labels:  linear-regression, scikit-learn
Ds and ml projects
Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-78.95%)
Mutual labels:  linear-regression, scikit-learn
hdnom
Benchmarking and Visualization Toolkit for Penalized Cox Models
Stars: ✭ 36 (-86.47%)
Mutual labels:  linear-regression, high-dimensional-data
machine learning
A gentle introduction to machine learning: data handling, linear regression, naive bayes, clustering
Stars: ✭ 22 (-91.73%)
Mutual labels:  linear-regression, scikit-learn
DS-Cookbook101
A jupyter notebook having all most frequent used code snippet for daily data scienceoperations
Stars: ✭ 59 (-77.82%)
Mutual labels:  scikit-learn
NimbusML-Samples
Samples for NimbusML, a Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.
Stars: ✭ 31 (-88.35%)
Mutual labels:  scikit-learn
Football Prediction Project
This project will pull past game data from api-football, and use these statistics to predict the outcome of future premier league matches through machine learning.
Stars: ✭ 44 (-83.46%)
Mutual labels:  scikit-learn
TotalLeastSquares.jl
Solve many kinds of least-squares and matrix-recovery problems
Stars: ✭ 23 (-91.35%)
Mutual labels:  linear-regression
Reconstruction-and-Compression-of-Color-Images
Reconstruction and Compression of Color Images Using Principal Component Analysis (PCA) Algorithm
Stars: ✭ 30 (-88.72%)
Mutual labels:  principal-component-analysis
machine learning in python
Demo of basic machine learning models in python with Jupter Notebook
Stars: ✭ 16 (-93.98%)
Mutual labels:  linear-regression
dataquest-guided-projects-solutions
My dataquest project solutions
Stars: ✭ 35 (-86.84%)
Mutual labels:  scikit-learn
BasisFunctionExpansions.jl
Basis Function Expansions for Julia
Stars: ✭ 19 (-92.86%)
Mutual labels:  linear-regression
Machine-learning-toolkits-with-python
Machine learning toolkits with Python
Stars: ✭ 31 (-88.35%)
Mutual labels:  scikit-learn
osprey
🦅Hyperparameter optimization for machine learning pipelines 🦅
Stars: ✭ 71 (-73.31%)
Mutual labels:  scikit-learn

abess: Fast Best-Subset Selection in Python and R

Python Build R Build codecov docs R docs cran pypi Conda version pyversions License Codacy Badge CodeFactor

Overview

abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection, i.e., find a small subset of predictors such that the resulting model is expected to have the highest accuracy. The selection for best subset shows great value in scientific researches and practical applications. For example, clinicians want to know whether a patient is healthy or not based on the expression levels of a few of important genes.

This library implements a generic algorithm framework to find the optimal solution in an extremely fast way. This framework now supports the detection of best subset under: linear regression, classification (binary or multi-class), counting-response modeling, censored-response modeling, multi-response modeling (multi-tasks learning), etc. It also supports the variants of best subset selection like group best subset selection, nuisance penalized regression, Especially, the time complexity of (group) best subset selection for linear regression is certifiably polynomial.

Quick start

The abess software has both Python and R's interfaces. Here a quick start will be given and for more details, please view: Installation.

Python package

Install the stable version of Python-package from Pypi:

$ pip install abess

or conda-forge:

$ conda install abess

Best subset selection for linear regression on a simulated dataset in Python:

from abess.linear import LinearRegression
from abess.datasets import make_glm_data
sim_dat = make_glm_data(n = 300, p = 1000, k = 10, family = "gaussian")
model = LinearRegression()
model.fit(sim_dat.x, sim_dat.y)

See more examples analyzed with Python in the Python tutorials.

R package

Install the stable version of R-package from CRAN with:

install.packages("abess")

Best subset selection for linear regression on a simulated dataset in R:

library(abess)
sim_dat <- generate.data(n = 300, p = 1000)
abess(x = sim_dat[["x"]], y = sim_dat[["y"]])

See more examples analyzed with R in the R tutorials.

Runtime Performance

To show the power of abess in computation, we assess its timings of the CPU execution (seconds) on synthetic datasets, and compare to state-of-the-art variable selection methods. The variable selection and estimation results are deferred to Python performance and R performance. All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM.

Python package

We compare abess Python package with scikit-learn on linear regression and logistic regression. Results are presented in the below figure:

It can be see that abess uses the least runtime to find the solution. This results can be reproduced by running the command in shell:

$ python abess/docs/simulation/Python/timings.py

R package

We compare abess R package with three widely used R packages: glmnet, ncvreg, and L0Learn. We get the runtime comparison results:

Compared with other packages, abess shows competitive computational efficiency, and achieves the best computational power when variables have a large correlation.

Conducting the following command in shell can reproduce the above results in R:

$ Rscript abess/docs/simulation/R/timings.R

Open source software

abess is a free software and its source code is publicly available on Github. The core framework is programmed in C++, and user-friendly R and Python interfaces are offered. You can redistribute it and/or modify it under the terms of the GPL-v3 License. We welcome contributions for abess, especially stretching abess to the other best subset selection problems.

What's news

New features:

  • abess Python package can be installed via conda.
  • abess R package is is highlighted as one of the core packages in CRAN Task View: Machine Learning & Statistical Learning.
  • On Windows, the recommended C++ compiler shifts from Mingw to Microsoft Visual Studio.
  • Support predicting survival function in abess.linear.CoxPHSurvivalAnalysis.
  • Rename estimators in Python. Please check here.

New best subset selection tasks:

  • Generalized linear model for ordinal regression (a.k.a rank learning in some machine learning literature).

Citation

If you use abess or reference our tutorials in a presentation or publication, we would appreciate citations of our library.

Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, Xueqin Wang (2021). “abess: A Fast Best Subset Selection Library in Python and R.” arXiv:2110.09697.

The corresponding BibteX entry:

@article{zhu-abess-arxiv,
  author    = {Jin Zhu and Liyuan Hu and Junhao Huang and Kangkang Jiang and Yanhang Zhang and Shiyun Lin and Junxian Zhu and Xueqin Wang},
  title     = {abess: A Fast Best Subset Selection Library in Python and R},
  journal   = {arXiv:2110.09697},
  year      = {2021},
}

References

  • Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123.

  • Pölsterl, S (2020). scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn. J. Mach. Learn. Res., 21(212), 1-6.

  • Yanhang Zhang, Junxian Zhu, Jin Zhu, and Xueqin Wang (2021). Certifiably Polynomial Algorithm for Best Group Subset Selection. arXiv preprint arXiv:2104.12576.

  • Qiang Sun and Heping Zhang (2020). Targeted Inference Involving High-Dimensional Data Using Nuisance Penalized Regression, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1737079.

  • Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, and Xueqin Wang (2021). abess: A Fast Best Subset Selection Library in Python and R. arXiv:2110.09697, 2021.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].