PredictiveEcology / reproducible

Licence: other

A set of tools for R that enhance reproducibility beyond package management

Programming Languages

7636 projects

Projects that are alternatives of or similar to reproducible

Drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

Stars: ✭ 1,301 (+3842.42%)

Mutual labels: reproducible-research, r-package, reproducibility

Targets

Function-oriented Make-like declarative workflows for R

Stars: ✭ 293 (+787.88%)

Mutual labels: reproducible-research, r-package, reproducibility

Gtsummary

Presentation-Ready Data Summary and Analytic Result Tables

Stars: ✭ 450 (+1263.64%)

Mutual labels: reproducible-research, r-package, reproducibility

fertile

creating optimal conditions for reproducibility

Stars: ✭ 52 (+57.58%)

Mutual labels: reproducible-research, reproducibility

Reproducibility Guide

project page for creating a guide to reproducible research

Stars: ✭ 116 (+251.52%)

Mutual labels: reproducible-research, reproducibility

Steppy

Lightweight, Python library for fast and reproducible experimentation 🔬

Stars: ✭ 119 (+260.61%)

Mutual labels: reproducible-research, reproducibility

Git2rdata

An R package for storing and retrieving data.frames in git repositories.

Stars: ✭ 84 (+154.55%)

Mutual labels: reproducible-research, r-package

reproducibility-guide

⛔ ARCHIVED ⛔

Stars: ✭ 119 (+260.61%)

Mutual labels: reproducible-research, reproducibility

targets-tutorial

Short course on the targets R package

Stars: ✭ 87 (+163.64%)

Mutual labels: reproducible-research, reproducibility

software-dev

Coding Standards for the USC Biostats group

Stars: ✭ 33 (+0%)

Mutual labels: reproducible-research, reproducibility

Reproducibilty-Challenge-ECANET

Unofficial Implementation of ECANets (CVPR 2020) for the Reproducibility Challenge 2020.

Stars: ✭ 27 (-18.18%)

Mutual labels: reproducible-research, reproducibility

Awesome Reproducible Research

A curated list of reproducible research case studies, projects, tutorials, and media

Stars: ✭ 106 (+221.21%)

Mutual labels: reproducible-research, reproducibility

Enmf

This is our implementation of ENMF: Efficient Neural Matrix Factorization (TOIS. 38, 2020). This also provides a fair evaluation of existing state-of-the-art recommendation models.

Stars: ✭ 96 (+190.91%)

Mutual labels: reproducible-research, reproducibility

Reprozip

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

Stars: ✭ 231 (+600%)

Mutual labels: reproducible-research, reproducibility

ukbrest

ukbREST: efficient and streamlined data access for reproducible research of large biobanks

Stars: ✭ 32 (-3.03%)

Mutual labels: reproducible-research, reproducibility

targets-minimal

A minimal example data analysis project with the targets R package

Stars: ✭ 50 (+51.52%)

Mutual labels: reproducible-research, reproducibility

benchmark VAE

Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)

Stars: ✭ 1,211 (+3569.7%)

Mutual labels: reproducible-research, reproducibility

ReproducibleScience

Short course on reproducible science: what, why, how

Stars: ✭ 23 (-30.3%)

Mutual labels: reproducible-research, reproducibility

Evalai

☁️ 🚀 📊 📈 Evaluating state of the art in AI

Stars: ✭ 1,087 (+3193.94%)

Mutual labels: reproducible-research, reproducibility

Drake Examples

Example workflows for the drake R package

Stars: ✭ 57 (+72.73%)

Mutual labels: reproducible-research, reproducibility

View All Similar Projects ➔

reproducible

A set of tools for R that enhance reproducibility for data analytics and forecasting. This package aims at making high-level, robust, machine and OS independent tools for making deeply reproducible and reusable content in R.

News

See updates from latest CRAN and development versions. Note that versions 1.0.0 and later are not compatible with previous versions. The current version can be much faster and creates smaller repository files (each with specific options set using Suggests packages) and allows for different (e.g., RPostgres backends for the database -- not the saved files, however; these are still saved locally).

Reproducible workflows

A reproducible workflow is a series of code steps (e.g., in a script) that, when run, produce the same output from the same inputs every time. The big challenge with such a workflow is that many steps are so time consuming that a scientist tends to not re-run each step every time. After many months of work, it is often unclear if the code will actually function from the start. Is the original dataset still there? Have the packages that were used been updated? Are some of the steps missing because there was some "point and clicking"?

The best way to maintain reproducibility is to have all the code re-run all the time. That way, errors are detected early and can be fixed. The challenge is how to make all the steps fast enough that it becomes convenient to re-run everything from scratch each time.

`Cache`

Caching is the principle tool to achieve this reproducible work-flow. There are many existing tools that support some notion of caching. The main tool here, Cache, can be nested hierarchically, becoming very powerful for the data science developer who is regularly working at many levels of an analysis.

rnorm(1) # give a random number
Cache(rnorm, 1) # generates a random number
Cache(rnorm, 1) # recovers the previous random number because call is identical

`prepInputs`

A common data problem is starting from a raw (spatial) dataset and getting it into shape for an analysis. Often, copies of a dataset are haphazardly placed in ad hoc local file systems. This makes it particularly difficult to share the workflow. The solution to this is use a canonical location (e.g., cloud storage, permalink to original data provider, etc.) and use tools that are smart enough to download only once.

Get a geospatial dataset. It will be checksummed (locally), meaning if the file is already in place locally, it will not download it again.

# Using dlFun -- a custom download function -- passed to preProcess
test1 <- prepInputs(targetFile = "GADM_2.8_LUX_adm0.rds", # must specify currently
                    dlFun = "raster::getData", name = "GADM", country = "LUX", level = 0,
                    path = dPath)

`Cache` with `prepInputs`

Putting these tools together allows for very rich data flows. For example, with prepInputs and using the fun argument or passing a studyArea, a raw dataset can be downloaded, loaded into R, and post processed -- all potentially very time consuming steps resulting in a clean, often much smaller dataset. Wrapping all these with a Cache can make it very quick.

test1 <- Cache(prepInputs, targetFile = "GADM_2.8_LUX_adm0.rds", # must specify currently
                    dlFun = "raster::getData", name = "GADM", country = "LUX", level = 0,
                    path = dPath)

See vignettes and help files for many more real-world examples.

Installation

Current release (on CRAN)

Install from CRAN:

install.packages("reproducible")

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/reproducible", dependencies = TRUE)

Development version

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/reproducible", ref = "development", dependencies = TRUE)

Contributions

Please see CONTRIBUTING.md for information on how to contribute to this project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

PredictiveEcology / reproducible

Programming Languages

Labels

Projects that are alternatives of or similar to reproducible

reproducible

News

Reproducible workflows

`Cache`

`prepInputs`

`Cache` with `prepInputs`

Installation

Current release (on CRAN)

Development version

Contributions

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

PredictiveEcology / reproducible

Programming Languages

Labels

Projects that are alternatives of or similar to reproducible

reproducible

News

Reproducible workflows

Cache

prepInputs

Cache with prepInputs

Installation

Current release (on CRAN)

Development version

Contributions

`Cache`

`prepInputs`

`Cache` with `prepInputs`