Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → klainfo → Scottknottesd

klainfo / Scottknottesd

The Scott-Knott Effect Size Difference (ESD) test

Programming Languages

7636 projects

Labels

cran

Projects that are alternatives of or similar to Scottknottesd

modmarg

Calculating Marginal Effects and Levels with Errors Using the Delta Method

Stars: ✭ 15 (+87.5%)

Mutual labels: cran

Fivethirtyeight

R package of data and code behind the stories and interactives at FiveThirtyEight

Stars: ✭ 422 (+5175%)

Mutual labels: cran

Future

🚀 R package: future: Unified Parallel and Distributed Processing in R for Everyone

Stars: ✭ 735 (+9087.5%)

Mutual labels: cran

Bayesab

🐢 bayesAB: Fast Bayesian Methods for A/B Testing

Stars: ✭ 273 (+3312.5%)

Mutual labels: cran

Dataexplorer

Automate Data Exploration and Treatment

Stars: ✭ 362 (+4425%)

Mutual labels: cran

Rmdformats

HTML output formats for RMarkdown documents

Stars: ✭ 492 (+6050%)

Mutual labels: cran

digest

R package to create compact hash digests of R objects

Stars: ✭ 94 (+1075%)

Mutual labels: cran

Bgdata

A Suite of Packages for Analysis of Big Genomic Data

Stars: ✭ 19 (+137.5%)

Mutual labels: cran

Officer

👮 officer: office documents from R

Stars: ✭ 405 (+4962.5%)

Mutual labels: cran

Highcharter

R wrapper for highcharts

Stars: ✭ 583 (+7187.5%)

Mutual labels: cran

Heatmaply

Interactive Heat Maps for R Using plotly

Stars: ✭ 275 (+3337.5%)

Mutual labels: cran

Flextable

table farming

Stars: ✭ 288 (+3500%)

Mutual labels: cran

Mindr

an R package which converts markdown files (.md, .Rmd) into mindmaps (brainstorms)

Stars: ✭ 513 (+6312.5%)

Mutual labels: cran

backports

Reimplementations of Functions Introduced Since R-3.0.0

Stars: ✭ 56 (+600%)

Mutual labels: cran

Googlesheets

Google Spreadsheets R API

Stars: ✭ 771 (+9537.5%)

Mutual labels: cran

GDINA

Stars: ✭ 23 (+187.5%)

Mutual labels: cran

Rio

A Swiss-Army Knife for Data I/O

Stars: ✭ 467 (+5737.5%)

Mutual labels: cran

Stream

A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package

Stars: ✭ 23 (+187.5%)

Mutual labels: cran

Forecast

forecast package for R

Stars: ✭ 893 (+11062.5%)

Mutual labels: cran

Rcpp

Seamless R and C++ Integration

Stars: ✭ 572 (+7050%)

Mutual labels: cran

View All Similar Projects ➔

ScottKnottESD (v2.0.3)

The Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with non-negligible difference [Tantithamthavorn et al., (2018) http://dx.doi.org/10.1109/TSE.2018.2794977]. It is an alternative approach of the Scott-Knott test that considers the magnitude of the difference (i.e., effect size) of treatment means with-in a group and between groups. Therefore, the Scott-Knott ESD test (v2.x) produces the ranking of treatment means while ensuring that (1) the magnitude of the difference for all of the treatments in each group is negligible; and (2) the magnitude of the difference of treatments between groups is non-negligible.

The mechanism of the Scott-Knott ESD test (v2.0.3) is made up of 2 steps:

(Step 1) Find a partition that maximizes treatment means between groups. We begin by sorting the treatment means. Then, following the original Scott-Knott test, we compute the sum of squares between groups (i.e., a dispersion measure of data points) to identify a partition that maximizes treatment means between groups.
(Step 2) Splitting into two groups or merging into one group. Instead of using a likelihood ratio test and a Chi-square distribution as a splitting and merging criterion (i.e., a hypothesis testing of the equality of all treatment means), we analyze the magnitude of the difference for each pair for all of the treatment means of the two groups. If there is any one pair of treatment means of two groups are non-negligible, we split into two groups. Otherwise, we merge into one group. We use the Cohen effect size --- an effect size estimate based on the difference between the two means divided by the standard deviation of the two treatment means (d = (mean(x_1) - mean(x_2))/s.d.).

Unlike the earlier version of the Scott-Knott ESD test (v1.x) that post-processes the groups that are produced by the Scott-Knott test, the Scott-Knott ESD test (v2.x) pre-processes the groups by merging pairs of statistically distinct groups that have a negligible difference.

Example usage scenarios in software engineering domain.

(1) Ranking and identifying the most influential variables that are produced by random forests models or regression models.

Kabinna et al. "Examining the stability of logging statements." Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2016.
Li et al. "Towards just-in-time suggestions for log changes." Empirical Software Engineering (2016): 1-35.
Tian et al. "What are the characteristics of high-rated apps? a case study on free android applications." Proceedings of the International Conference onSoftware Maintenance and Evolution (ICSME), 2015.
Tantithamthavorn et al. "The impact of mislabelling on the performance and interpretation of defect prediction models." Proceedings of the International Conference on Software Engineering (ICSE), 2015.

(2) Ranking and identifying the top-performing feature selection, classification, and model validation techniques for defect prediction models.

Rajbahadur et al. "The Impact Of Using Regression Models to Build Defect Classifiers." Proceedings of the International Conference on Mining Software Repositories (MSR), 2017.
Ghotra et al. "A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models" Proceedings of the International Conference on Mining Software Repositories (MSR), 2017.
Tantithamthavorn et al. "An Empirical Comparison of Model Validation Techniques for Defect Prediction Models." IEEE Transactions on Software Engineering (TSE), 2017.
Tantithamthavorn et al. "Automated parameter optimization of classification techniques for defect prediction models." Proceedings of the 38th International Conference on Software Engineering (ICSE), 2016.
Ghotra et al. "Revisiting the impact of classification techniques on the performance of defect prediction models." Proceedings of the International Conference on Software Engineering (ICSE), 2015.

(3) Ranking and identifying the most frequent developer search tasks.

Xia et al. "What do developers search for on the web?" Empirical Software Engineering (2017): 1-37.

Installation

Install the current release from CRAN::

install.packages("ScottKnottESD")

Install the development version from GitHub:

install.packages("devtools")
devtools::install_github("klainfo/ScottKnottESD", ref="development")

Example Usage

library(ScottKnottESD)

# An example dataset: The 1,000 variable importance scores of 9 software metrics. 
# The scores are generated by the Random Forests technique using 1,000 out-of-sample bootstrap.
example

sk <- sk_esd(example)
plot(sk)

sk <- sk_esd(maven)
plot(sk)

Referencing ScottKnottESD

ScottKnottESD can be referenced as:

@article{tantithamthavorn2017mvt,
    Author={Tantithamthavorn, Chakkrit and McIntosh, Shane and Hassan, Ahmed E. and Matsumoto, Kenichi},
    Title = {An Empirical Comparison of Model Validation Techniques for Defect Prediction Models},
    Booktitle = {IEEE Transactions on Software Engineering (TSE)},
    Volumn = {43},
    Number = {1},
    page = {1-18},
    Year = {2017}
}
@article{tantithamthavorn2018optimization,
    Author={Tantithamthavorn, Chakkrit and McIntosh, Shane and Hassan, Ahmed E. and Matsumoto, Kenichi},
    Title = {The Impact of Automated Parameter Optimization for Defect Prediction Models},
    Booktitle = {IEEE Transactions on Software Engineering (TSE)},
    page = {Early Access},
    Year = {2018}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 8

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗