NorskRegnesentral / shapr

Licence: other
Explaining the output of machine learning models with more accurately estimated Shapley values

Programming Languages

r
7636 projects
C++
36643 projects - #6 most used programming language
TeX
3793 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to shapr

fastshap
Fast approximate Shapley values in R
Stars: ✭ 79 (-16.84%)
Mutual labels:  explainable-ai, explainable-ml, shapley
Mindsdb
Predictive AI layer for existing databases.
Stars: ✭ 4,199 (+4320%)
Mutual labels:  explainable-ai, explainable-ml
Interpret
Fit interpretable models. Explain blackbox machine learning.
Stars: ✭ 4,352 (+4481.05%)
Mutual labels:  explainable-ai, explainable-ml
DataScience ArtificialIntelligence Utils
Examples of Data Science projects and Artificial Intelligence use cases
Stars: ✭ 302 (+217.89%)
Mutual labels:  explainable-ai, explainable-ml
global-attribution-mapping
GAM (Global Attribution Mapping) explains the landscape of neural network predictions across subpopulations
Stars: ✭ 18 (-81.05%)
Mutual labels:  explainable-ai, explainable-ml
ml-fairness-framework
FairPut - Machine Learning Fairness Framework with LightGBM — Explainability, Robustness, Fairness (by @firmai)
Stars: ✭ 59 (-37.89%)
Mutual labels:  explainable-ai, explainable-ml
rcppensmallen
Rcpp integration for the Ensmallen templated C++ mathematical optimization library
Stars: ✭ 28 (-70.53%)
Mutual labels:  rcpp, rcpparmadillo
scanstatistics
An R package for space-time anomaly detection using scan statistics.
Stars: ✭ 41 (-56.84%)
Mutual labels:  rcpp, rcpparmadillo
dlime experiments
In this work, we propose a deterministic version of Local Interpretable Model Agnostic Explanations (LIME) and the experimental results on three different medical datasets shows the superiority for Deterministic Local Interpretable Model-Agnostic Explanations (DLIME).
Stars: ✭ 21 (-77.89%)
Mutual labels:  explainable-ai, explainable-ml
responsible-ai-toolbox
This project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
Stars: ✭ 615 (+547.37%)
Mutual labels:  explainable-ai, explainable-ml
mllp
The code of AAAI 2020 paper "Transparent Classification with Multilayer Logical Perceptrons and Random Binarization".
Stars: ✭ 15 (-84.21%)
Mutual labels:  explainable-ai, explainable-ml
SHAP FOLD
(Explainable AI) - Learning Non-Monotonic Logic Programs From Statistical Models Using High-Utility Itemset Mining
Stars: ✭ 35 (-63.16%)
Mutual labels:  explainable-ai, explainable-ml
ProtoTree
ProtoTrees: Neural Prototype Trees for Interpretable Fine-grained Image Recognition, published at CVPR2021
Stars: ✭ 47 (-50.53%)
Mutual labels:  explainable-ai, explainable-ml
Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: ✭ 3,191 (+3258.95%)
Mutual labels:  explainable-ai, explainable-ml
textTinyR
Text Processing for Small or Big Data Files in R
Stars: ✭ 32 (-66.32%)
Mutual labels:  rcpp, rcpparmadillo
URT
Fast Unit Root Tests and OLS regression in C++ with wrappers for R and Python
Stars: ✭ 70 (-26.32%)
Mutual labels:  rcpp, rcpparmadillo
CARLA
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
Stars: ✭ 166 (+74.74%)
Mutual labels:  explainable-ai, explainable-ml
path explain
A repository for explaining feature attributions and feature interactions in deep neural networks.
Stars: ✭ 151 (+58.95%)
Mutual labels:  explainable-ai
travis
⛔ ARCHIVED ⛔ Set Up 'Travis' for Testing and Deployment
Stars: ✭ 61 (-35.79%)
Mutual labels:  rstats
ddsm-visual-primitives
Using deep learning to discover interpretable representations for mammogram classification and explanation
Stars: ✭ 25 (-73.68%)
Mutual labels:  explainable-ai

shapr

CRAN_Status_Badge CRAN_Downloads_Badge R build status Lifecycle: experimental License: MIT DOI

The most common machine learning task is to train a model which is able to predict an unknown outcome (response variable) based on a set of known input variables/features. When using such models for real life applications, it is often crucial to understand why a certain set of features lead to exactly that prediction. However, explaining predictions from complex, or seemingly simple, machine learning models is a practical and ethical question, as well as a legal issue. Can I trust the model? Is it biased? Can I explain it to others? We want to explain individual predictions from a complex machine learning model by learning simple, interpretable explanations.

Shapley values is the only prediction explanation framework with a solid theoretical foundation (Lundberg and Lee (2017)). Unless the true distribution of the features are known, and there are less than say 10-15 features, these Shapley values needs to be estimated/approximated. Popular methods like Shapley Sampling Values (Štrumbelj and Kononenko (2014)), SHAP/Kernel SHAP (Lundberg and Lee (2017)), and to some extent TreeSHAP (Lundberg, Erion, and Lee (2018)), assume that the features are independent when approximating the Shapley values for prediction explanation. This may lead to very inaccurate Shapley values, and consequently wrong interpretations of the predictions. Aas, Jullum, and Løland (2021) extends and improves the Kernel SHAP method of Lundberg and Lee (2017) to account for the dependence between the features, resulting in significantly more accurate approximations to the Shapley values. See the paper for details.

This package implements the methodology of Aas, Jullum, and Løland (2021).

The following methodology/features are currently implemented:

  • Native support of explanation of predictions from models fitted with the following functions stats::glm, stats::lm,ranger::ranger, xgboost::xgboost/xgboost::xgb.train and mgcv::gam.
  • Accounting for feature dependence assuming the features are Gaussian (Aas, Jullum, and Løland (2021)).
  • Accounting for feature dependence with a Gaussian copula (Gaussian dependence structure, any marginal) (Aas, Jullum, and Løland (2021)).
  • Accounting for feature dependence using the Mahalanobis distance based empirical (conditional) distribution approach of Aas, Jullum, and Løland (2021).
  • Accounting for feature dependence using conditional inference trees (Redelmeier, Jullum, and Aas (2020)).
  • Combining any of the four methods.
  • Optional use of the AICc criterion of Hurvich, Simonoff, and Tsai (1998) when optimizing the bandwidth parameter in the empirical (conditional) approach of Aas, Jullum, and Løland (2021).
  • Functionality for visualizing the explanations.
  • Support for models not supported natively.

Future releases will include:

  • Support for parallelization over explanations, Monte Carlo sampling and features subsets for non-parallelizable prediction functions.
  • Computational improvement of the AICc optimization approach,
  • Adaptive selection of method to account for the feature dependence.

Note that both the features and the prediction must be numeric. The approach is constructed for continuous features. Discrete features may also work just fine with the empirical (conditional) distribution approach. Unlike SHAP and TreeSHAP, we decompose probability predictions directly to ease the interpretability, i.e. not via log odds transformations. The application programming interface (API) of shapr is inspired by Pedersen and Benesty (2019).

Installation

To install the current stable release from CRAN, use

install.packages("shapr")

To install the current development version, use

remotes::install_github("NorskRegnesentral/shapr")

If you would like to install all packages of the models we currently support, use

remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE)

If you would also like to build and view the vignette locally, use

remotes::install_github("NorskRegnesentral/shapr", dependencies = TRUE, build_vignettes = TRUE)
vignette("understanding_shapr", "shapr")

You can always check out the latest version of the vignette here.

Example

shapr supports computation of Shapley values with any predictive model which takes a set of numeric features and produces a numeric outcome.

The following example shows how a simple xgboost model is trained using the Boston Housing Data, and how shapr explains the individual predictions.

library(xgboost)
library(shapr)

data("Boston", package = "MASS")

x_var <- c("lstat", "rm", "dis", "indus")
y_var <- "medv"

ind_x_test <- 1:6
x_train <- as.matrix(Boston[-ind_x_test, x_var])
y_train <- Boston[-ind_x_test, y_var]
x_test <- as.matrix(Boston[ind_x_test, x_var])

# Looking at the dependence between the features
cor(x_train)
#>            lstat         rm        dis      indus
#> lstat  1.0000000 -0.6108040 -0.4928126  0.5986263
#> rm    -0.6108040  1.0000000  0.1999130 -0.3870571
#> dis   -0.4928126  0.1999130  1.0000000 -0.7060903
#> indus  0.5986263 -0.3870571 -0.7060903  1.0000000

# Fitting a basic xgboost model to the training data
model <- xgboost(
  data = x_train,
  label = y_train,
  nround = 20,
  verbose = FALSE
)

# Prepare the data for explanation
explainer <- shapr(x_train, model)
#> The specified model provides feature classes that are NA. The classes of data are taken as the truth.

# Specifying the phi_0, i.e. the expected prediction without any features
p <- mean(y_train)

# Computing the actual Shapley values with kernelSHAP accounting for feature dependence using
# the empirical (conditional) distribution approach with bandwidth parameter sigma = 0.1 (default)
explanation <- explain(
  x_test,
  approach = "empirical",
  explainer = explainer,
  prediction_zero = p
)

# Printing the Shapley values for the test data.
# For more information about the interpretation of the values in the table, see ?shapr::explain.
print(explanation$dt)
#>      none     lstat         rm       dis      indus
#> 1: 22.446 5.2632030 -1.2526613 0.2920444  4.5528644
#> 2: 22.446 0.1671901 -0.7088401 0.9689005  0.3786871
#> 3: 22.446 5.9888022  5.5450858 0.5660134 -1.4304351
#> 4: 22.446 8.2142204  0.7507572 0.1893366  1.8298304
#> 5: 22.446 0.5059898  5.6875103 0.8432238  2.2471150
#> 6: 22.446 1.9929673 -3.6001958 0.8601984  3.1510531

# Finally we plot the resulting explanations
plot(explanation)

Contribution

All feedback and suggestions are very welcome. Details on how to contribute can be found here. If you have any questions or comments, feel free to open an issue here.

Please note that the ‘shapr’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Aas, Kjersti, Martin Jullum, and Anders Løland. 2021. “Explaining Individual Predictions When Features Are Dependent: More Accurate Approximations to Shapley Values.” Artificial Intelligence 298.

Hurvich, Clifford M, Jeffrey S Simonoff, and Chih-Ling Tsai. 1998. “Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60 (2): 271–93.

Lundberg, Scott M, Gabriel G Erion, and Su-In Lee. 2018. “Consistent Individualized Feature Attribution for Tree Ensembles.” arXiv Preprint arXiv:1802.03888.

Lundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Advances in Neural Information Processing Systems, 4765–74.

Pedersen, Thomas Lin, and Michaël Benesty. 2019. Lime: Local Interpretable Model-Agnostic Explanations. https://CRAN.R-project.org/package=lime.

Redelmeier, Annabelle, Martin Jullum, and Kjersti Aas. 2020. “Explaining Predictive Models with Mixed Features Using Shapley Values and Conditional Inference Trees.” In International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 117–37. Springer.

Štrumbelj, Erik, and Igor Kononenko. 2014. “Explaining Prediction Models and Individual Predictions with Feature Contributions.” Knowledge and Information Systems 41 (3): 647–65.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].