Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ekstroem → Datamaid

ekstroem / Datamaid

An R package for data screening

Labels

html reproducible-research data-cleaning

Projects that are alternatives of or similar to Datamaid

Sartools

Statistical Analysis of RNA-Seq Tools

Stars: ✭ 80 (-33.33%)

Mutual labels: reproducible-research

Enmf

This is our implementation of ENMF: Efficient Neural Matrix Factorization (TOIS. 38, 2020). This also provides a fair evaluation of existing state-of-the-art recommendation models.

Stars: ✭ 96 (-20%)

Mutual labels: reproducible-research

Reproducible Image Denoising State Of The Art

Collection of popular and reproducible image denoising works.

Stars: ✭ 1,776 (+1380%)

Mutual labels: reproducible-research

Git2rdata

An R package for storing and retrieving data.frames in git repositories.

Stars: ✭ 84 (-30%)

Mutual labels: reproducible-research

Drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

Stars: ✭ 1,301 (+984.17%)

Mutual labels: reproducible-research

Fma

FMA: A Dataset For Music Analysis

Stars: ✭ 1,391 (+1059.17%)

Mutual labels: reproducible-research

Porcupine

Express parametrable, composable and portable data pipelines

Stars: ✭ 70 (-41.67%)

Mutual labels: reproducible-research

Steppy

Lightweight, Python library for fast and reproducible experimentation 🔬

Stars: ✭ 119 (-0.83%)

Mutual labels: reproducible-research

Refinr

Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms

Stars: ✭ 91 (-24.17%)

Mutual labels: data-cleaning

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+1163.33%)

Mutual labels: data-cleaning

Bumblebee

🚕 A spreadsheet-like data preparation web app that works over Optimus (pandas, dask, cuDF, dask-cuDF and PySpark)

Stars: ✭ 86 (-28.33%)

Mutual labels: data-cleaning

Grand Challenge.org

A platform for end-to-end development of machine learning solutions in biomedical imaging

Stars: ✭ 89 (-25.83%)

Mutual labels: reproducible-research

Awesome Reproducible Research

A curated list of reproducible research case studies, projects, tutorials, and media

Stars: ✭ 106 (-11.67%)

Mutual labels: reproducible-research

Openml R

R package to interface with OpenML

Stars: ✭ 81 (-32.5%)

Mutual labels: reproducible-research

Osfr

R interface to the Open Science Framework (OSF)

Stars: ✭ 117 (-2.5%)

Mutual labels: reproducible-research

My Journey In The Data Science World

📢 Ready to learn or review your knowledge!

Stars: ✭ 1,175 (+879.17%)

Mutual labels: data-cleaning

Nextflow

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+1014.17%)

Mutual labels: reproducible-research

Pandas Videos

Jupyter notebook and datasets from the pandas Q&A video series

Stars: ✭ 1,716 (+1330%)

Mutual labels: data-cleaning

Reproducibility Guide

project page for creating a guide to reproducible research

Stars: ✭ 116 (-3.33%)

Mutual labels: reproducible-research

Everware

Everware is about re-useable science, it allows people to jump right in to your research code.

Stars: ✭ 112 (-6.67%)

Mutual labels: reproducible-research

View All Similar Projects ➔

dataMaid

dataMaid is an R package for documenting and creating reports on data cleanliness.

dataMaid has become dataReporter

dataMaid has been renamed to dataReporter. All future updates and development will be made for dataReporter. Install the new package from CRAN like this

install.packages("dataReporter")

or install the development version from Github:

devtools::install_github("ekstroem/dataReporter")

Please report bugs at our new repository.

Installation

This github page contains the development version of dataMaid. For the latest stable version download the package from CRAN directly using

install.packages("dataMaid")

To install the development version of dataMaid run the following commands from within R (requires that the devtools package is already installed)

devtools::install_github("ekstroem/dataMaid")

Package overview

A super simple way to get started is to load the package and use the makeDataReport() function on a data frame (if you try to generate several reports for the same data, then it may be necessary to add the replace=TRUE argument to overwrite the existing report).

library("dataMaid")
data(trees)
makeDataReport(trees)

This will create a report with summaries and error checks for each variable in the trees data frame. The format of the report depends on your OS and whether you have have a LaTeX installation on your computer, which is needed for creating pdf reports.

Using dataMaid interactively

The dataMaid package can also be used interactively by running checks for the individual variables or for all variables in the dataset

data(toyData)
check(toyData$events)  # Individual check of events
check(toyData) # Check all variables at once

By default the standard battery of tests is run depending on the variable type. If we just want a specific test for, say, a numeric variable then we can specify that. All available checks can be viewed by calling allCheckFunctions(). See the documentation for an overview of the checks available or how to create and include your own tests.

check(toyData$events, checks = setChecks(numeric = "identifyMissing"))

We can also access the graphics or summary tables that are produced for a variable by calling the visualize or summarize functions. One can visualize a single variable or a full dataset:

#Visualize a variable
visualize(toyData$events)

#Visualize a dataset
visualize(toyData)

The same is true for summaries. Note also that the choice of checks/visualizations/summaries are customizable:

#Summarize a variable with default settings:
summarize(toyData$events) 

#Summarize a variable with user-specified settings:
summarize(toyData$events, summaries = setSummaries(all =  c("centralValue", "minMax"))

Detailed documentation

You can read the main paper accompanying the package at the Journal of Statistical Software. It provides a detailed introduction to the dataMaid package.

We also have two blog posts that provide an introduction to the package. The can be found here (the primary one) and here.

Moreover, we have created a vignette that describes how to extend dataMaid to include user-defined data screening checks, summaries and visualizations. This vignette is called extending_dataMaid:

vignette("extending_dataMaid")

Online app

We are currently working on an online version of the tool, where users can upload their data and get a report. A prototype is already up and running - we just need to configure the R server correctly.

Until we have set it up online, you can try it out on your own machine:

library(shiny)
runUrl("https://github.com/ekstroem/dataMaid/raw/master/app/app.zip")

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 120

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗