All Projects → tidymodels → Tidypredict

tidymodels / Tidypredict

Run predictions inside the database

Programming Languages

r
7636 projects

Labels

Projects that are alternatives of or similar to Tidypredict

Tidy
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
Stars: ✭ 307 (+37.67%)
Mutual labels:  dplyr
Tidyquant
Bringing financial analysis to the tidyverse
Stars: ✭ 635 (+184.75%)
Mutual labels:  dplyr
Friendlyeval
A friendly interface to tidyeval/rlang that will excuse itself when you're done.
Stars: ✭ 103 (-53.81%)
Mutual labels:  dplyr
Tidylog
Tidylog provides feedback about dplyr and tidyr operations. It provides wrapper functions for the most common functions, such as filter, mutate, select, and group_by, and provides detailed output for joins.
Stars: ✭ 428 (+91.93%)
Mutual labels:  dplyr
Tidyexplain
🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate
Stars: ✭ 558 (+150.22%)
Mutual labels:  dplyr
Dance
tibble() dancing 💃
Stars: ✭ 41 (-81.61%)
Mutual labels:  dplyr
starwarsdb
Relational Data from the Star Wars API for Learning and Teaching
Stars: ✭ 34 (-84.75%)
Mutual labels:  dplyr
Tidyheatmap
Draw heatmap simply using a tidy data frame
Stars: ✭ 151 (-32.29%)
Mutual labels:  dplyr
Siuba
Python library for using dplyr like syntax with pandas and SQL
Stars: ✭ 605 (+171.3%)
Mutual labels:  dplyr
Sspipe
Simple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-56.95%)
Mutual labels:  dplyr
Dtplyr
Data table backend for dplyr
Stars: ✭ 456 (+104.48%)
Mutual labels:  dplyr
Moderndive book
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Stars: ✭ 527 (+136.32%)
Mutual labels:  dplyr
Rsqlserver
SQL Server DBI for R, based on the jTDS driver
Stars: ✭ 76 (-65.92%)
Mutual labels:  dplyr
Timetk
A toolkit for working with time series in R
Stars: ✭ 371 (+66.37%)
Mutual labels:  dplyr
Sergeant
💂 Tools to Transform and Query Data with 'Apache' 'Drill'
Stars: ✭ 120 (-46.19%)
Mutual labels:  dplyr
dplyrExtras
Some extra functionality that is not (yet) in dplyr, e.g. mutate_rows or s_filter, s_arrange ,...
Stars: ✭ 20 (-91.03%)
Mutual labels:  dplyr
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (+247.53%)
Mutual labels:  dplyr
Chunked
Chunkwise Text-file Processing for 'dplyr'
Stars: ✭ 153 (-31.39%)
Mutual labels:  dplyr
Tidyquery
Query R data frames with SQL
Stars: ✭ 138 (-38.12%)
Mutual labels:  dplyr
Big Data
🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-58.3%)
Mutual labels:  dplyr

tidypredict

R-CMD-check CRAN_Status_Badge Codecov test coverage Downloads

The main goal of tidypredict is to enable running predictions inside databases. It reads the model, extracts the components needed to calculate the prediction, and then creates an R formula that can be translated into SQL. In other words, it is able to parse a model such as this one:

model <- lm(mpg ~ wt + cyl, data = mtcars)

tidypredict can return a SQL statement that is ready to run inside the database. Because it uses dplyr’s database interface, it works with several databases back-ends, such as MS SQL:

tidypredict_sql(model, dbplyr::simulate_mssql())
## <SQL> 39.6862614802529 + (`wt` * -3.19097213898374) + (`cyl` * -1.5077949682598)

Installation

Install tidypredict from CRAN using:

# install.packages("tidypredict")

Or install the development version using devtools as follows:

# install.packages("remotes")
# remotes::install_github("tidymodels/tidypredict")

Functions

tidypredict has only a few functions, and it is not expected that number to grow much. The main focus at this time is to add more models to support.

Function Description
tidypredict_fit() Returns an R formula that calculates the prediction
tidypredict_sql() Returns a SQL query based on the formula from tidypredict_fit()
tidypredict_to_column() Adds a new column using the formula from tidypredict_fit()
tidypredict_test() Tests tidyverse predictions against the model’s native predict() function
tidypredict_interval() Same as tidypredict_fit() but for intervals (only works with lm and glm)
tidypredict_sql_interval() Same as tidypredict_sql() but for intervals (only works with lm and glm)
parse_model() Creates a list spec based on the R model
as_parsed_model() Prepares an object to be recognized as a parsed model

How it works

Instead of translating directly to a SQL statement, tidypredict creates an R formula. That formula can then be used inside dplyr. The overall workflow would be as illustrated in the image above, and described here:

  1. Fit the model using a base R model, or one from the packages listed in Supported Models
  2. tidypredict reads model, and creates a list object with the necessary components to run predictions
  3. tidypredict builds an R formula based on the list object
  4. dplyr evaluates the formula created by tidypredict
  5. dplyr translates the formula into a SQL statement, or any other interfaces.
  6. The database executes the SQL statement(s) created by dplyr

Parsed model spec

tidypredict writes and reads a spec based on a model. Instead of simply writing the R formula directly, splitting the spec from the formula adds the following capabilities:

  1. No more saving models as .rds - Specifically for cases when the model needs to be used for predictions in a Shiny app.
  2. Beyond R models - Technically, anything that can write a proper spec, can be read into tidypredict. It also means, that the parsed model spec can become a good alternative to using PMML.

Supported models

The following models are supported by tidypredict:

  • Linear Regression - lm()
  • Generalized Linear model - glm()
  • Random Forest models - randomForest::randomForest()
  • Random Forest models, via ranger - ranger::ranger()
  • MARS models - earth::earth()
  • XGBoost models - xgboost::xgb.Booster.complete()
  • Cubist models - Cubist::cubist()
  • Tree models, via partykit - partykit::ctree()

parsnip

tidypredict supports models fitted via the parsnip interface. The ones confirmed currently work in tidypredict are:

  • lm() - parsnip: linear_reg() with “lm” as the engine.
  • randomForest::randomForest() - parsnip: rand_forest() with “randomForest” as the engine.
  • ranger::ranger() - parsnip: rand_forest() with “ranger” as the engine.
  • earth::earth() - parsnip: mars() with “earth” as the engine.

broom

The tidy() function from broom works with linear models parsed via tidypredict

pm <- parse_model(lm(wt ~ ., mtcars))
tidy(pm)
## # A tibble: 11 x 2
##    term        estimate
##    <chr>          <dbl>
##  1 (Intercept) -0.231  
##  2 mpg         -0.0417 
##  3 cyl         -0.0573 
##  4 disp         0.00669
##  5 hp          -0.00323
##  6 drat        -0.0901 
##  7 qsec         0.200  
##  8 vs          -0.0664 
##  9 am           0.0184 
## 10 gear        -0.0935 
## 11 carb         0.249

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].