All Projects → wlandau → Drake Examples

wlandau / Drake Examples

Example workflows for the drake R package

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Drake Examples

Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+2182.46%)
Mutual labels:  makefile, data-science, pipeline, reproducible-research, reproducibility, workflow, rstats, high-performance-computing
Targets
Function-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+414.04%)
Mutual labels:  data-science, pipeline, reproducible-research, reproducibility, workflow, rstats, high-performance-computing
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-63.16%)
Mutual labels:  data-science, pipeline, reproducible-research, reproducibility
Steppy
Lightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (+108.77%)
Mutual labels:  data-science, pipeline, reproducible-research, reproducibility
targets-minimal
A minimal example data analysis project with the targets R package
Stars: ✭ 50 (-12.28%)
Mutual labels:  pipeline, reproducible-research, high-performance-computing, reproducibility
Vistrails
VisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the computational processes that derive these products and their executions.
Stars: ✭ 94 (+64.91%)
Mutual labels:  pipeline, reproducibility, workflow
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+117.54%)
Mutual labels:  pipeline, reproducible-research, workflow
Batchflow
BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (+173.68%)
Mutual labels:  data-science, pipeline, workflow
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+140.35%)
Mutual labels:  data-science, reproducibility, high-performance-computing
Plynx
PLynx is a domain agnostic platform for managing reproducible experiments and data-oriented workflows.
Stars: ✭ 192 (+236.84%)
Mutual labels:  data-science, reproducibility, workflow
targets-tutorial
Short course on the targets R package
Stars: ✭ 87 (+52.63%)
Mutual labels:  pipeline, reproducible-research, reproducibility
open-solution-googleai-object-detection
Open solution to the Google AI Object Detection Challenge 🍁
Stars: ✭ 46 (-19.3%)
Mutual labels:  pipeline, reproducible-research, reproducibility
Metaflow
🚀 Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+8861.4%)
Mutual labels:  data-science, reproducible-research, rstats
Gtsummary
Presentation-Ready Data Summary and Analytic Result Tables
Stars: ✭ 450 (+689.47%)
Mutual labels:  reproducible-research, reproducibility, rstats
Gitpr
Quick reference guide on fork and pull request workflow
Stars: ✭ 902 (+1482.46%)
Mutual labels:  makefile, workflow
Cookiecutter
DEPRECIATED! Please use nf-core/tools instead
Stars: ✭ 18 (-68.42%)
Mutual labels:  pipeline, workflow
Datofutbol
Dato Fútbol repository
Stars: ✭ 23 (-59.65%)
Mutual labels:  data-science, rstats
Rmarkdown Website Tutorial
Tutorial for creating websites w/ R Markdown
Stars: ✭ 26 (-54.39%)
Mutual labels:  data-science, rstats
Football Data
football (soccer) datasets
Stars: ✭ 18 (-68.42%)
Mutual labels:  data-science, rstats
Blogr
Scripts + data to recreate analyses published on http://benjaminlmoore.wordpress.com and http://blm.io
Stars: ✭ 23 (-59.65%)
Mutual labels:  data-science, rstats

Launch Rstudio Binder

Consider targets

superseded lifecycle

drake is superseded. The targets R package is the long-term successor of drake, and it is more robust and easier to use. Please visit https://books.ropensci.org/targets/drake.html for full context and advice on transitioning.

drake examples

This repository is part of a community effort to collect, curate, and share publicly available examples of data analysis projects powered by the drake R package. Each folder is its own example with a self-sufficient set of code and data files.

Run in a browser

Click this badge to open the examples in RStudio through your browser: Launch Rstudio Binder

Run locally

You can download example files and run them locally with drake itself.

# Install and load drake.
devtools::install_github("ropensci/drake")
library(drake)
# List the available examples.
drake_examples()
# Get an example
drake_example("main")
list.files() # See the new 'main' folder

Contributing

Please read the top-level CONTRIBUTING.md and CONDUCT.md for rules and instructions.

Introductory examples

  • customer-churn: based on an RStudio Solutions Engineering example of how to use Keras with R. The motivation comes from a blog post by Matt Dancho, and the code is based on a notebook by Edgar Ruiz.
  • stan: validating a small Bayesian hierarchical model with rstan.
  • main: drake's main bare-bones introductory example, originally written by Kirill Müller and modified by Will Landau. Now based on R's built-in airquality dataset.
  • gsp: A concrete example using real econometrics data. It explores the relationships between gross state product and other quantities, and it shows off drake's ability to generate lots of reproducibly-tracked tasks with ease.
  • packages: A concrete example using data on R package downloads. It demonstrates how drake can refresh a project based on new incoming data without restarting everything from scratch.
  • mtcars: An old legacy example with the mtcars dataset. Use load_mtcars_example() to set up the project in your workspace.

High-performance computing examples

  • mlr-slurm: an example machine learning workflow rigged to deploy to a SLURM cluster.
  • Docker-psock: demonstrates how to deploy targets to a Docker container using a specialized PSOCK cluster.
  • sge: uses drake's high-performance computing functionality to send work to a Grid Engine cluster.
  • slurm: similar to sge, but for SLURM.
  • torque: similar to sge, but for TORQUE.

Example for developing drake

  • hpc-profiling: an example with a small number of medium-ish-sized datasets. The goal is to assess how long it takes (relatively speaking) to shuffle data around hpc workers.
  • overhead: an example explicitly designed to maximize strain on drake's internals. The purpose is to support profiling studies to speed up drake.

Demonstrations of specific features

  • script-based-workflows: demonstrates how to adapt drake to an imperative script-based project.
  • code_to_plan: questioning. Refer to script-based-workflows instead.

Real-world examples outside this repo

The official rOpenSci use cases and associated discussion threads describe applications of drake in action. Here are some more real-world sightings of drake in the wild.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].