All Projects → aaronpeikert → Reproducible Research

aaronpeikert / Reproducible Research

Licence: cc-by-4.0
A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Reproducible Research

Open-Data-Lab
an initiative to provide infrastructure for reproducible workflows around open data
Stars: ✭ 26 (-72.63%)
Mutual labels:  open-science, reproducibility
OSODOS
Open Science, Open Data, Open Source
Stars: ✭ 23 (-75.79%)
Mutual labels:  open-science, reproducibility
ck-mlops
A collection of portable workflows, automation recipes and components for MLOps in a unified CK format. Note that this repository is outdated - please check the 2nd generation of the CK workflow automation meta-framework with portable MLOps and DevOps components here:
Stars: ✭ 15 (-84.21%)
Mutual labels:  open-science, reproducibility
Ck
Collective Knowledge framework (CK) helps to organize black-box research software as a database of reusable components and micro-services with common APIs, automation actions and extensible meta descriptions. See real-world use cases from Arm, General Motors, ACM, Raspberry Pi foundation and others:
Stars: ✭ 395 (+315.79%)
Mutual labels:  reproducibility, open-science
Reproduce Stock Market Direction Random Forests
Reproduce research from paper "Predicting the direction of stock market prices using random forest"
Stars: ✭ 67 (-29.47%)
Mutual labels:  reproducibility
Autode
automated reaction profile generation
Stars: ✭ 48 (-49.47%)
Mutual labels:  open-science
Itkexamples
Cookbook examples for the Insight Toolkit documented with Sphinx
Stars: ✭ 38 (-60%)
Mutual labels:  open-science
Meta Review
Manuscript describing open collaborative writing with Manubot
Stars: ✭ 34 (-64.21%)
Mutual labels:  open-science
Dcmjs
dcmjs is a javascript cross-compile of dcmtk (dcmtk.org).
Stars: ✭ 92 (-3.16%)
Mutual labels:  open-science
Helm Charts
Kubernetes Helm Charts for the Center for Open Science
Stars: ✭ 88 (-7.37%)
Mutual labels:  open-science
Garage
A toolkit for reproducible reinforcement learning research.
Stars: ✭ 1,103 (+1061.05%)
Mutual labels:  reproducibility
Tensorhub
TensorHub is a library built on top of TensorFlow 2.0 to provide simple, modular and repeatable abstractions to accelerate deep learning research.
Stars: ✭ 48 (-49.47%)
Mutual labels:  reproducibility
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-14.74%)
Mutual labels:  open-science
Doathon
Our discussion forum (see "issues") for the OpenCon Do-A-Thon, a day of trying, making, testing and doing to advance Open Research & Education. See our full website, with more information (including Github Help, and how to get involved).
Stars: ✭ 45 (-52.63%)
Mutual labels:  open-science
Scalac Profiling
Implementation of SCP-010.
Stars: ✭ 90 (-5.26%)
Mutual labels:  reproducibility
Dvc
🦉Data Version Control | Git for Data & Models | ML Experiments Management
Stars: ✭ 9,004 (+9377.89%)
Mutual labels:  reproducibility
Open Science Resources
A publicly-editable collection of open science resources, including tools, datasets, meta-resources, etc.
Stars: ✭ 58 (-38.95%)
Mutual labels:  open-science
Awesome Open Science Software
awesome open list of pointers about open science for software and computational science
Stars: ✭ 87 (-8.42%)
Mutual labels:  open-science
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (-40%)
Mutual labels:  reproducibility
Evalai
☁️ 🚀 📊 📈 Evaluating state of the art in AI
Stars: ✭ 1,087 (+1044.21%)
Mutual labels:  reproducibility

This is the accompanying GitHub repository to a work in progress paper by Aaron PeikertORCID iD and Andreas M. Brandmaier ORCID iD.

licensebuttons by Ask Me Anything ! Open Source Love

Abstract

In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles, and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), 2) the analysis exactly reproduces at a later point in time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.

Resources

Tool How to install? How to learn?
Windows only:
Chocolately
Visit chocolatey.org. Chocolately installs software for you, it is installed and called from the terminal/command prompt.
To open the comand prompt, press Windows+X and then click on “Command Prompt” or “Command Prompt (Admin).”
OS X only:
Homebrew
Visit brew.sh. Homebrew installs software for you. It is installed and called from the terminal/command prompt.
To open the terminal press Command + Space to open Spotlight and then type “Terminal” and double click on the top search result.
R Windows:
Use Chocolately (from the terminal).
choco install -y r.project

OS X:
Use Homebrew.
brew install r
Read: R for Data Science
Rstudio Windows:
Use Chocolately (from the terminal).
choco install -y r.studio

OS X:
Use Homebrew (from the terminal).
brew cask install rstudio
Skim the cheatsheet
rmarkdown Within Rstudio, type into the R-console:
install.packages("rmarkdown")
Read the cheatsheet. Skim R Markdown: The Definitive Guide
Git Windows:
Use Chocolately (from the terminal).
choco install -y git

OS X:
Git gets installed with Homebrew.
Nothing to do.
Read Part IV Git fundamentals And skim the rest of Happy Git and Gitub for the useR.
GitHub Create an account on: github.com
And apply for Student/Researcher Benefits
Read Part II Connect Git, GitHub, RStudio And III Early GitHub Wins.
Make Windows:
Use chocolately.
choco install -y make

OS X:
Make is preinstalled on OS X.
Nothing to do.
Read Minimal Make
Docker Windows:
Use chocolately.
choco install -y docker-desktop

OS X:
Use Homebrew (from the terminal).
brew cask install docker

Linux:
Follow steps described in: Post-installation steps for Linux
Read An Introduction to Rocker: Docker Containers for R.

Compile

The following paragraphs describe how you can obtain a copy of the source files of our manuscript describing reproducible workflows, and create the PDF. Either, you can go the ‘standard’ way of downloading a local copy of the repository and knit the manuscript file in R, or you can use the reproducible workflow as suggested and use Make to create a container and build the final PDF file in exactly the same virtual computational environment that we used to render the PDF.

Standard Way

Requires: Git, RStudio, pandoc, pandoc-citeproc & rmarkdown.

Open RStudio -> File -> New Project -> Version Control -> Git

Insert:

https://github.com/aaronpeikert/reproducible-research.git

Open manuscript.Rmd click on Knit.

Using a Reproducible Workflow

Does not require R or RStudio, but make & docker.

Execute in Terminal:

git clone https://github.com/aaronpeikert/reproducible-research.git
cd reproducible-research
make build
make all DOCKER=TRUE

Note: Windows user need to manually edit the Makefile and set current_path to the current directory and use make all DOCKER=TRUE WINDOWS=TRUE. We hope that future releases of Docker for Windows will not require that workaround.

Rebuild Everything

In case you experience some unexpected behavior with this workflow, you should check that you have the most recent version (git pull), rebuild the docker image (make build) and force the rebuild of all targets (make -B DOCKER).

git pull && make rebuild && make -B DOCKER=TRUE

Session Info

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Debian GNU/Linux 9 (stretch)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Etc/UTC                     
##  date     2021-03-19                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
##  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
##  cli           2.0.0   2019-12-09 [1] CRAN (R 3.6.1)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
##  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)
##  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.6.1)
##  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
##  here        * 0.1     2017-05-28 [1] CRAN (R 3.6.1)
##  hms           0.5.2   2019-10-30 [1] CRAN (R 3.6.1)
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
##  knitr         1.26    2019-11-12 [1] CRAN (R 3.6.1)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)
##  pander      * 0.6.3   2018-11-06 [1] CRAN (R 3.6.1)
##  pillar        1.4.3   2019-12-20 [1] CRAN (R 3.6.1)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)
##  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)
##  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
##  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.1)
##  rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.1)
##  rmarkdown     2.0     2019-12-12 [1] CRAN (R 3.6.1)
##  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
##  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
##  tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.1)
##  vctrs         0.2.1   2019-12-17 [1] CRAN (R 3.6.1)
##  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
##  xfun          0.11    2019-11-12 [1] CRAN (R 3.6.1)
##  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
##  zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.1)
## 
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].