Tirgit / missCompare

Licence: other

missCompare R package - intuitive missing data imputation framework

Programming Languages

7636 projects

Projects that are alternatives of or similar to missCompare

TotalLeastSquares.jl

Solve many kinds of least-squares and matrix-recovery problems

Stars: ✭ 23 (-25.81%)

Mutual labels: imputation, missing-data, missing-data-imputation

rMIDAS

R package for missing-data imputation with deep learning

Stars: ✭ 20 (-35.48%)

Mutual labels: imputation-methods

xdem

Analysis of digital elevation models (DEMs)

Stars: ✭ 50 (+61.29%)

Mutual labels: comparison

version-compare

↔️ Rust library to easily compare version strings. Mirror from https://gitlab.com/timvisee/version-compare

Stars: ✭ 32 (+3.23%)

Mutual labels: comparison

neptune-client

📒 Experiment tracking tool and model registry

Stars: ✭ 348 (+1022.58%)

Mutual labels: comparison

Seiyuu.moe

A webpage searching for collaborate works between seiyuu.

Stars: ✭ 15 (-51.61%)

Mutual labels: comparison

json-path-comparison

Comparison of the different implementations of JSONPath and language agnostic test suite.

Stars: ✭ 64 (+106.45%)

Mutual labels: comparison

arabic-text-diacritization

Benchmark Arabic text diacritization dataset

Stars: ✭ 41 (+32.26%)

Mutual labels: comparison

hood

The plugin to manage benchmarks on your CI

Stars: ✭ 17 (-45.16%)

Mutual labels: comparison

BetaML.jl

Beta Machine Learning Toolkit

Stars: ✭ 64 (+106.45%)

Mutual labels: imputation

language-benchmarks

A simple benchmark system for compiled and interpreted languages.

Stars: ✭ 21 (-32.26%)

Mutual labels: comparison

microdiff

A fast, zero dependency object and array comparison library. Significantly faster than most other deep comparison libraries and has full TypeScript support.

Stars: ✭ 3,138 (+10022.58%)

Mutual labels: comparison

CarND-Extended-Kalman-Filter-P6

Self Driving Car Project 6 - Sensor Fusion(Extended Kalman Filter)

Stars: ✭ 24 (-22.58%)

Mutual labels: rmse

elm-javascript-haskell-equivalents

Comparison of similar functions across Elm, Javascript, and Haskell

Stars: ✭ 31 (+0%)

Mutual labels: comparison

py ml utils

Python utilities for Machine Learning competitions

Stars: ✭ 29 (-6.45%)

Mutual labels: missing-data

octoclairvoyant-webapp

Compare GitHub changelogs across multiple releases in a single view.

Stars: ✭ 45 (+45.16%)

Mutual labels: comparison

grids

A grid comparison standard

Stars: ✭ 74 (+138.71%)

Mutual labels: comparison

npm-vs-yarn

Compare npm vs yarn

Stars: ✭ 36 (+16.13%)

Mutual labels: comparison

stringosim

String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...

Stars: ✭ 47 (+51.61%)

Mutual labels: comparison

ncdu-diff

ncdu fork that can compare and diff results

Stars: ✭ 21 (-32.26%)

Mutual labels: comparison

View All Similar Projects ➔

Overview of missCompare

missCompare is a missing data imputation pipeline that will guide you through your missing data problem. A range of functions will help you select what could be the most ideal algorithm for your data and provide an easy way to impute missing datapoints in your dataset.

The missCompare pipeline

You will find a detailed manual in the missCompare vignette:

install.packages("missCompare")
library(missCompare)
vignette("misscompare")

Cleaning your data using missCompare::clean()
Extracting information on dimensions, missingness, correlations and variables, plotting missing data using missCompare::get_data()
Imputation - simulated data:

simulating full data with no missingness using metadata from the previous step (resembling your original data) using missCompare::simulated()
spiking in missing data in distinct missing data patterns using missCompare::all_patterns(). These patterns are:
- missing completely at random (MCAR) - missCompare::MCAR() - missing data occurrence random
- missing at random (MAR) - missCompare::MAR() - missing data occurrence correlates with other variables' values (univariate solution in missCompare)
- missing not at random (MNAR) - missCompare::MNAR() - missing data occurrence correlates with variables' own values
- missing in assumed pattern (MAP) - missCompare::MAP() - a combination of the previous three, where the user can define a pattern per variable
imputing missing data, obtaining imputation metrics (root mean squared errors - RMSE, mean absolute error - MAE, Kolmogorov-Smirnov test statistic D for equal distributions, computation time) per method and plotting results using missCompare::impute_simulated()

Imputing your data - After the previous step, you will have a general idea about what are the best performing algorithms for your data structure (size, degree of correlation between variables). In this step, you can impute your original data with your chosen algorithm(s) using missCompare::impute_data()
Post imputation diagnostics will give an informative assessment on how the imputation changed your data structure (e.g. variable means, distributions, clusters, correlations). The function here is missCompare::post_imp_diag()

Installation

You can install the released version of missCompare from CRAN with:

install.packages("missCompare")

Usage

Loading library and sandbox data

library(missCompare)
data("clindata_miss")

Cleaning

cleaned <- missCompare::clean(clindata_miss,
                              var_removal_threshold = 0.5, 
                              ind_removal_threshold = 1,
                              missingness_coding = -9)

Extracting data

metadata <- missCompare::get_data(cleaned,
                                  matrixplot_sort = T,
                                  plot_transform = T)

Imputation - simulation framework

missCompare::impute_simulated(rownum = metadata$Rows,
                              colnum = metadata$Columns, 
                              cormat = metadata$Corr_matrix,
                              MD_pattern = metadata$MD_Pattern,
                              NA_fraction = metadata$Fraction_missingness,
                              min_PDM = 10,
                              n.iter = 50, 
                              assumed_pattern = NA)

Computation time comparison

RMSE comparison

KS comparison

Imputation of data

imputed <- missCompare::impute_data(cleaned, 
                         scale = T, 
                         n.iter = 10, 
                         sel_method = c(1:16))

Post imputation diagnostics

diag <- missCompare::post_imp_diag(cleaned,
                                   imputed$mean_imputation[[1]],
                                   scale=T, 
                                   n.boot = 100)

Post imputation diagnostics - distributions of original and imputed values for a random variable

Post imputation diagnostics - variable clusters in the original and imputed datasets

Post imputation diagnostics - comparison of variable-pair correlations

Issues, questions

In case you need help or advice on your missing data problem or you need help with the missCompare package, please e-mail the authors. If you would like to report an issue, please do so in a reproducible example at the missCompare GitHub page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Tirgit / missCompare

Programming Languages

Labels

Projects that are alternatives of or similar to missCompare

Overview of missCompare

The missCompare pipeline

Installation

Usage

Loading library and sandbox data

Cleaning

Extracting data

Imputation - simulation framework

Computation time comparison

RMSE comparison

KS comparison

Imputation of data

Post imputation diagnostics

Post imputation diagnostics - distributions of original and imputed values for a random variable

Post imputation diagnostics - variable clusters in the original and imputed datasets

Post imputation diagnostics - comparison of variable-pair correlations

Issues, questions