All Projects → Tirgit → missCompare

Tirgit / missCompare

Licence: other
missCompare R package - intuitive missing data imputation framework

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to missCompare

TotalLeastSquares.jl
Solve many kinds of least-squares and matrix-recovery problems
Stars: ✭ 23 (-25.81%)
Mutual labels:  imputation, missing-data, missing-data-imputation
rMIDAS
R package for missing-data imputation with deep learning
Stars: ✭ 20 (-35.48%)
Mutual labels:  imputation-methods
xdem
Analysis of digital elevation models (DEMs)
Stars: ✭ 50 (+61.29%)
Mutual labels:  comparison
version-compare
↔️ Rust library to easily compare version strings. Mirror from https://gitlab.com/timvisee/version-compare
Stars: ✭ 32 (+3.23%)
Mutual labels:  comparison
neptune-client
📒 Experiment tracking tool and model registry
Stars: ✭ 348 (+1022.58%)
Mutual labels:  comparison
Seiyuu.moe
A webpage searching for collaborate works between seiyuu.
Stars: ✭ 15 (-51.61%)
Mutual labels:  comparison
json-path-comparison
Comparison of the different implementations of JSONPath and language agnostic test suite.
Stars: ✭ 64 (+106.45%)
Mutual labels:  comparison
arabic-text-diacritization
Benchmark Arabic text diacritization dataset
Stars: ✭ 41 (+32.26%)
Mutual labels:  comparison
hood
The plugin to manage benchmarks on your CI
Stars: ✭ 17 (-45.16%)
Mutual labels:  comparison
BetaML.jl
Beta Machine Learning Toolkit
Stars: ✭ 64 (+106.45%)
Mutual labels:  imputation
language-benchmarks
A simple benchmark system for compiled and interpreted languages.
Stars: ✭ 21 (-32.26%)
Mutual labels:  comparison
microdiff
A fast, zero dependency object and array comparison library. Significantly faster than most other deep comparison libraries and has full TypeScript support.
Stars: ✭ 3,138 (+10022.58%)
Mutual labels:  comparison
CarND-Extended-Kalman-Filter-P6
Self Driving Car Project 6 - Sensor Fusion(Extended Kalman Filter)
Stars: ✭ 24 (-22.58%)
Mutual labels:  rmse
elm-javascript-haskell-equivalents
Comparison of similar functions across Elm, Javascript, and Haskell
Stars: ✭ 31 (+0%)
Mutual labels:  comparison
py ml utils
Python utilities for Machine Learning competitions
Stars: ✭ 29 (-6.45%)
Mutual labels:  missing-data
octoclairvoyant-webapp
Compare GitHub changelogs across multiple releases in a single view.
Stars: ✭ 45 (+45.16%)
Mutual labels:  comparison
grids
A grid comparison standard
Stars: ✭ 74 (+138.71%)
Mutual labels:  comparison
npm-vs-yarn
Compare npm vs yarn
Stars: ✭ 36 (+16.13%)
Mutual labels:  comparison
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (+51.61%)
Mutual labels:  comparison
ncdu-diff
ncdu fork that can compare and diff results
Stars: ✭ 21 (-32.26%)
Mutual labels:  comparison

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Licence Build Status Downloads


minimal R version CRAN_Status_Badge packageversion

Overview of missCompare

missCompare is a missing data imputation pipeline that will guide you through your missing data problem. A range of functions will help you select what could be the most ideal algorithm for your data and provide an easy way to impute missing datapoints in your dataset.

The missCompare pipeline

You will find a detailed manual in the missCompare vignette:

install.packages("missCompare")
library(missCompare)
vignette("misscompare")
  1. Cleaning your data using missCompare::clean()
  2. Extracting information on dimensions, missingness, correlations and variables, plotting missing data using missCompare::get_data()
  3. Imputation - simulated data:
  • simulating full data with no missingness using metadata from the previous step (resembling your original data) using missCompare::simulated()
  • spiking in missing data in distinct missing data patterns using missCompare::all_patterns(). These patterns are:
    • missing completely at random (MCAR) - missCompare::MCAR() - missing data occurrence random
    • missing at random (MAR) - missCompare::MAR() - missing data occurrence correlates with other variables' values (univariate solution in missCompare)
    • missing not at random (MNAR) - missCompare::MNAR() - missing data occurrence correlates with variables' own values
    • missing in assumed pattern (MAP) - missCompare::MAP() - a combination of the previous three, where the user can define a pattern per variable
  • imputing missing data, obtaining imputation metrics (root mean squared errors - RMSE, mean absolute error - MAE, Kolmogorov-Smirnov test statistic D for equal distributions, computation time) per method and plotting results using missCompare::impute_simulated()
  1. Imputing your data - After the previous step, you will have a general idea about what are the best performing algorithms for your data structure (size, degree of correlation between variables). In this step, you can impute your original data with your chosen algorithm(s) using missCompare::impute_data()
  2. Post imputation diagnostics will give an informative assessment on how the imputation changed your data structure (e.g. variable means, distributions, clusters, correlations). The function here is missCompare::post_imp_diag()

Installation

You can install the released version of missCompare from CRAN with:

install.packages("missCompare")

Usage

Loading library and sandbox data

library(missCompare)
data("clindata_miss")

Cleaning

cleaned <- missCompare::clean(clindata_miss,
                              var_removal_threshold = 0.5, 
                              ind_removal_threshold = 1,
                              missingness_coding = -9)

Extracting data

metadata <- missCompare::get_data(cleaned,
                                  matrixplot_sort = T,
                                  plot_transform = T)

Imputation - simulation framework

missCompare::impute_simulated(rownum = metadata$Rows,
                              colnum = metadata$Columns, 
                              cormat = metadata$Corr_matrix,
                              MD_pattern = metadata$MD_Pattern,
                              NA_fraction = metadata$Fraction_missingness,
                              min_PDM = 10,
                              n.iter = 50, 
                              assumed_pattern = NA)

Computation time comparison

RMSE comparison

KS comparison

Imputation of data

imputed <- missCompare::impute_data(cleaned, 
                         scale = T, 
                         n.iter = 10, 
                         sel_method = c(1:16))

Post imputation diagnostics

diag <- missCompare::post_imp_diag(cleaned,
                                   imputed$mean_imputation[[1]],
                                   scale=T, 
                                   n.boot = 100)

Post imputation diagnostics - distributions of original and imputed values for a random variable

Post imputation diagnostics - variable clusters in the original and imputed datasets

Post imputation diagnostics - comparison of variable-pair correlations

Issues, questions

In case you need help or advice on your missing data problem or you need help with the missCompare package, please e-mail the authors. If you would like to report an issue, please do so in a reproducible example at the missCompare GitHub page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].