All Projects → benjilu → forestError

benjilu / forestError

Licence: other
A Unified Framework for Random Forest Prediction Error Estimation

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to forestError

25daysinmachinelearning
I will update this repository to learn Machine learning with python with statistics content and materials
Stars: ✭ 53 (+130.43%)
Mutual labels:  statistics, random-forest, machinelearning
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+9452.17%)
Mutual labels:  statistics, random-forest
Amazon-Fine-Food-Review
Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review
Stars: ✭ 28 (+21.74%)
Mutual labels:  random-forest, machinelearning
Elfi
ELFI - Engine for Likelihood-Free Inference
Stars: ✭ 208 (+804.35%)
Mutual labels:  statistics, inference
Emlearn
Machine Learning inference engine for Microcontrollers and Embedded devices
Stars: ✭ 154 (+569.57%)
Mutual labels:  random-forest, inference
Data-Science-and-Machine-Learning-Resources
List of Data Science and Machine Learning Resource that I frequently use
Stars: ✭ 19 (-17.39%)
Mutual labels:  statistics, machinelearning
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+365.22%)
Mutual labels:  statistics, prediction
Github-Stars-Predictor
It's a github repo star predictor that tries to predict the stars of any github repository having greater than 100 stars.
Stars: ✭ 34 (+47.83%)
Mutual labels:  random-forest, prediction
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (+86.96%)
Mutual labels:  random-forest, machinelearning
STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA
Forecast stock prices using machine learning approach. A time series analysis. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks
Stars: ✭ 94 (+308.7%)
Mutual labels:  random-forest, prediction
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (+1117.39%)
Mutual labels:  random-forest, machinelearning
clarktech-ncaab-predictor
A machine learning project to predict NCAA Men's Basketball outcomes
Stars: ✭ 24 (+4.35%)
Mutual labels:  prediction, randomforest
Bevel
Ordinal regression in Python
Stars: ✭ 41 (+78.26%)
Mutual labels:  inference, prediction
Notebooks Statistics And Machinelearning
Jupyter Notebooks from the old UnsupervisedLearning.com (RIP) machine learning and statistics blog
Stars: ✭ 270 (+1073.91%)
Mutual labels:  statistics, machinelearning
Neuropod
A uniform interface to run deep learning models from multiple frameworks
Stars: ✭ 858 (+3630.43%)
Mutual labels:  inference, machinelearning
cqr
Conformalized Quantile Regression
Stars: ✭ 152 (+560.87%)
Mutual labels:  random-forest, prediction
infer
🔮 Use TensorFlow models in Go to evaluate Images (and more soon!)
Stars: ✭ 65 (+182.61%)
Mutual labels:  inference, prediction
arboreto
A scalable python-based framework for gene regulatory network inference using tree-based ensemble regressors.
Stars: ✭ 33 (+43.48%)
Mutual labels:  random-forest, inference
InferenceHelper
C++ Helper Class for Deep Learning Inference Frameworks: TensorFlow Lite, TensorRT, OpenCV, OpenVINO, ncnn, MNN, SNPE, Arm NN, NNabla, ONNX Runtime, LibTorch, TensorFlow
Stars: ✭ 142 (+517.39%)
Mutual labels:  inference
exageostat
A High Performance Unified Framework for Geostatistics on Manycore Systems.
Stars: ✭ 20 (-13.04%)
Mutual labels:  prediction

forestError: A Unified Framework for Random Forest Prediction Error Estimation

Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Version 1.1.0 Update

Version 1.1.0 makes two changes. First, it enables estimation of the conditional misclassification rate of predictions by classification random forests as proposed by Lu and Hardin (2021). Second, it compartmentalizes a costly step in the quantForestError algorithm: The identification of each training observation's out-of-bag terminal nodes.

Conditional Misclassification Rate Estimation

The conditional misclassification rate of predictions by classification random forests can now be estimated. To estimate it, simply set the what argument in the quantForestError function to "mcr". what will default to this if the provided forest is a classification random forest. See the example code below for a toy demonstration of the performance of this estimator.

Compartmentalization

The identification of each training observation's out-of-bag terminal nodes is now compartmentalized from the main quantForestError function. By isolating this step from the main quantForestError function, Version 1.1.0 allows users to more efficiently iterate the algorithm. Users may wish to feed quantForestError batches of test observations iteratively if they have streaming data or a large test set that cannot be processed in one go due to memory constraints. In previous versions of this package, doing so would require the algorithm to recompute each training observation's out-of-bag terminal nodes in each iteration. This was redundant and costly. By separating this computation from the rest of the quantForestError algorithm, Version 1.1.0 allows the user to perform this computation only once.

As part of this modularization, the quantForestError function now has two additional arguments. If set to TRUE, return_train_nodes will return a data.table identifying each training observation's out-of-bag terminal nodes. This data.table can then be fed back into quantForestError via the argument train_nodes to avoid the redundant recomputation.

Version 1.1.0 also exports the function that produces the data.table identifying each training observation's out-of-bag terminal nodes. It is called findOOBErrors. Assuming the same inputs, findOOBErrors will produce the same output that is returned by setting return_train_nodes to TRUE in the quantForestError function.

See the documentation on quantForestError and findOOBErrors for examples.

Neither of these changes affects code that relied on Version 1.0.0 of this package, as the changes consist solely of a newly exported function, two optional arguments to quantForestError that by default do nothing new, and a new possible input for the what argument.

Overview

The forestError package estimates conditional misclassification rates, conditional mean squared prediction errors, conditional biases, conditional prediction intervals, and conditional error distributions for random forest predictions using the plug-in method introduced in Lu and Hardin (2021). These estimates are conditional on the test observations' predictor values, accounting for possible response heterogeneity, random forest prediction bias, and random forest prediction variability across the predictor space.

In its current state, the main function in this package accepts regression random forests built using any of the following packages:

  • randomForest,
  • randomForestSRC,
  • ranger, and
  • quantregForest.

Installation

Running the following line of code in R will install a stable version of this package from CRAN:

install.packages("forestError")

To install the developer version of this package from Github, run the following lines of code in R:

library(devtools)
devtools::install_github(repo = "benjilu/forestError")

Instructions

See the documentation for detailed information on how to use this package. A regression example and a classification example are given below.

######################## REGRESSION ########################
# load data
data(airquality)

# remove observations with missing predictor variable values
airquality <- airquality[complete.cases(airquality), ]

# get number of observations and the response column index
n <- nrow(airquality)
response.col <- 1

# split data into training and test sets
train.ind <- sample(1:n, n * 0.9, replace = FALSE)
Xtrain <- airquality[train.ind, -response.col]
Ytrain <- airquality[train.ind, response.col]
Xtest <- airquality[-train.ind, -response.col]
Ytest <- airquality[-train.ind, response.col]

# fit random forest to the training data
rf <- randomForest::randomForest(Xtrain, Ytrain, nodesize = 5,
                                 ntree = 500, keep.inbag = TRUE)

# estimate conditional mean squared prediction errors, conditional
# biases, conditional prediction intervals, and conditional error
# distribution functions for the test observations
output <- quantForestError(rf, Xtrain, Xtest)

######################## CLASSIFICATION ########################
# data-generating parameters
train_samp_size <- 10000
test_samp_size <- 5000
p <- 5

# generate binary data where the probability of success is a
# linear function of the first predictor variable
Xtrain <- data.frame(matrix(runif(train_samp_size * p),
                             ncol = p))
Xtest <- data.frame(matrix(runif(test_samp_size * p),
                            ncol = p))
Ytrain <- as.factor(rbinom(train_samp_size, 1, Xtrain$X1))

# fit random forest to training data
rf <- randomForest::randomForest(Xtrain, Ytrain, nodesize = 3,
                                 ntree = 1000, keep.inbag = TRUE)

# estimate conditional misclassification rate
output <- quantForestError(rf, Xtrain, Xtest)

# plot conditional misclassification rate against the signal
plot(Xtest$X1, output$mcr, xlab = "X1", ylab = "Estimated
     Misclassification Rate")

License

See DESCRIPTION for information.

Authors

Benjamin Lu and Johanna Hardin

References

  • Benjamin Lu and Johanna Hardin. A Unified Framework for Random Forest Prediction Error Estimation. Journal of Machine Learning Research, 22(8):1-41, 2021. [Link]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].