All Projects â†’ HenrikBengtsson â†’ future.batchtools

HenrikBengtsson / future.batchtools

Licence: other
🚀 R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools

Programming Languages

r
7636 projects
Makefile
30231 projects
shell
77523 projects

Projects that are alternatives of or similar to future.batchtools

Future.apply
🚀 R package: future.apply - Apply Function to Elements in Parallel using Futures
Stars: ✭ 159 (+106.49%)
Mutual labels:  package, hpc, parallel, distributed-computing
ParallelUtilities.jl
Fast and easy parallel mapreduce on HPC clusters
Stars: ✭ 28 (-63.64%)
Mutual labels:  hpc, parallel, distributed-computing
Easylambda
distributed dataflows with functional list operations for data processing with C++14
Stars: ✭ 475 (+516.88%)
Mutual labels:  hpc, parallel, distributed-computing
tasq
A simple task queue implementation to enqeue jobs on local or remote processes.
Stars: ✭ 83 (+7.79%)
Mutual labels:  job-scheduler, distributed-computing
slurmR
slurmR: A Lightweight Wrapper for Slurm
Stars: ✭ 43 (-44.16%)
Mutual labels:  hpc, slurm
HPC
A collection of various resources, examples, and executables for the general NREL HPC user community's benefit. Use the following website for accessing documentation.
Stars: ✭ 64 (-16.88%)
Mutual labels:  hpc, slurm
SlurmClusterManager.jl
julia package for running code on slurm clusters
Stars: ✭ 27 (-64.94%)
Mutual labels:  distributed-computing, slurm
pennylane-lightning
The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
Stars: ✭ 28 (-63.64%)
Mutual labels:  hpc, parallel
wrench
WRENCH: Cyberinfrastructure Simulation Workbench
Stars: ✭ 25 (-67.53%)
Mutual labels:  hpc, distributed-computing
task-spooler
A scheduler for GPU/CPU tasks
Stars: ✭ 77 (+0%)
Mutual labels:  job-scheduler, slurm
parallel
PARALLEL: Stata module for parallel computing
Stars: ✭ 97 (+25.97%)
Mutual labels:  hpc, parallel
zmq
ZeroMQ based distributed patterns
Stars: ✭ 27 (-64.94%)
Mutual labels:  parallel, distributed-computing
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-49.35%)
Mutual labels:  hpc, distributed-computing
t8code
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
Stars: ✭ 37 (-51.95%)
Mutual labels:  hpc, parallel
rTRNG
R package providing access and examples to TRNG C++ library
Stars: ✭ 17 (-77.92%)
Mutual labels:  hpc, parallel
launcher-scripts
(DEPRECATED) A set of launcher scripts to be used with OAR and Slurm for running jobs on the UL HPC platform
Stars: ✭ 14 (-81.82%)
Mutual labels:  hpc, slurm
ParMmg
Distributed parallelization of 3D volume mesh adaptation
Stars: ✭ 19 (-75.32%)
Mutual labels:  hpc, parallel
cruise
User space POSIX-like file system in main memory
Stars: ✭ 27 (-64.94%)
Mutual labels:  hpc, parallel
hp2p
Heavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-77.92%)
Mutual labels:  hpc, parallel
awflow
Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!
Stars: ✭ 15 (-80.52%)
Mutual labels:  hpc, slurm
CRAN check status R CMD check status future.tests checks status Coverage Status

future.batchtools: A Future API for Parallel and Distributed Processing using 'batchtools'

Introduction

The future package provides a generic API for using futures in R. A future is a simple yet powerful mechanism to evaluate an R expression and retrieve its value at some point in time. Futures can be resolved in many different ways depending on which strategy is used. There are various types of synchronous and asynchronous futures to choose from in the future package.

This package, future.batchtools, provides a type of futures that utilizes the batchtools package. This means that any type of backend that the batchtools package supports can be used as a future. More specifically, future.batchtools will allow you or users of your package to leverage the compute power of high-performance computing (HPC) clusters via a simple switch in settings - without having to change any code at all.

For instance, if batchtools is properly configures, the below two expressions for futures x and y will be processed on two different compute nodes:

> library("future.batchtools")
> plan(batchtools_torque)
>
> x %<-% { Sys.sleep(5); 3.14 }
> y %<-% { Sys.sleep(5); 2.71 }
> x + y
[1] 5.85

This is obviously a toy example to illustrate what futures look like and how to work with them.

A more realistic example comes from the field of cancer research where very large data FASTQ files, which hold a large number of short DNA sequence reads, are produced. The first step toward a biological interpretation of these data is to align the reads in each sample (one FASTQ file) toward the human genome. In order to speed this up, we can have each file be processed by a separate compute node and each node we can use 24 parallel processes such that each process aligns a separate chromosome. Here is an outline of how this nested parallelism could be implemented using futures.

library("future")
library("listenv")
## The first level of futures should be submitted to the
## cluster using batchtools.  The second level of futures
## should be using multisession, where the number of
## parallel processes is automatically decided based on
## what the cluster grants to each compute node.
plan(list(batchtools_torque, multisession))

## Find all samples (one FASTQ file per sample)
fqs <- dir(pattern = "[.]fastq$")

## The aligned results are stored in BAM files
bams <- listenv()

## For all samples (FASTQ files) ...
for (ss in seq_along(fqs)) {
  fq <- fqs[ss]

  ## ... use futures to align them ...
  bams[[ss]] %<-% {
    bams_ss <- listenv()
	## ... and for each FASTQ file use a second layer
	## of futures to align the individual chromosomes
    for (cc in 1:24) {
      bams_ss[[cc]] %<-% htseq::align(fq, chr = cc)
    }
	## Resolve the "chromosome" futures and return as a list
    as.list(bams_ss)
  }
}
## Resolve the "sample" futures and return as a list
bams <- as.list(bams)

Note that a user who do not have access to a cluster could use the same script processing samples sequentially and chromosomes in parallel on a single machine using:

plan(list(sequential, multisession))

or samples in parallel and chromosomes sequentially using:

plan(list(multisession, sequential))

For an introduction as well as full details on how to use futures, please consult the package vignettes of the future package.

Choosing batchtools backend

The future.batchtools package implements a generic future wrapper for all batchtools backends. Below are the most common types of batchtools backends.

Backend Description Alternative in future package
batchtools_torque Futures are evaluated via a TORQUE / PBS job scheduler N/A
batchtools_slurm Futures are evaluated via a Slurm job scheduler N/A
batchtools_sge Futures are evaluated via a Sun/Oracle Grid Engine (SGE) job scheduler N/A
batchtools_lsf Futures are evaluated via a Load Sharing Facility (LSF) job scheduler N/A
batchtools_openlava Futures are evaluated via an OpenLava job scheduler N/A
batchtools_custom Futures are evaluated via a custom batchtools configuration R script or via a set of cluster functions N/A
batchtools_interactive sequential evaluation in the calling R environment plan(transparent)
batchtools_multicore parallel evaluation by forking the current R process plan(multicore)
batchtools_local sequential evaluation in a separate R process (on current machine) plan(cluster, workers = "localhost")

Examples

Below is an examples illustrating how to use batchtools_torque to configure the batchtools backend. For further details and examples on how to configure batchtools, see the batchtools configuration wiki page.

To configure batchtools for job schedulers we need to setup a *.tmpl template file that is used to generate the script used by the scheduler. This is what a template file for TORQUE / PBS may look like:

#!/bin/bash

## Job name:
#PBS -N <%= if (exists("job.name", mode = "character")) job.name else job.hash %>

## Direct streams to logfile:
#PBS -o <%= log.file %>

## Merge standard error and output:
#PBS -j oe

## Email on abort (a) and termination (e), but not when starting (b)
#PBS -m ae

## Resources needed:
<% if (length(resources) > 0) {
  opts <- unlist(resources, use.names = TRUE)
  opts <- sprintf("%s=%s", names(opts), opts)
  opts <- paste(opts, collapse = ",") %>
#PBS -l <%= opts %>
<% } %>

## Launch R and evaluated the batchtools R job
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'

If this template is saved to file batchtools.torque.tmpl (without period) in the working directory or as .batchtools.torque.tmpl (with period) the user's home directory, then it will be automatically located by the batchtools framework and loaded when doing:

> plan(batchtools_torque)

Resource parameters can be specified via argument resources which should be a named list and is passed as is to the template file. For example, to request that each job would get alloted 12 cores (one a single machine) and up to 5 GiB of memory, use:

> plan(batchtools_torque, resources = list(nodes = "1:ppn=12", vmem = "5gb"))

To specify the resources argument at the same time as using nested future strategies, one can use tweak() to tweak the default arguments. For instance,

plan(list(
  tweak(batchtools_torque, resources = list(nodes = "1:ppn=12", vmem = "5gb")),
  multisession
))

causes the first level of futures to be submitted via the TORQUE job scheduler requesting 12 cores and 5 GiB of memory per job. The second level of futures will be evaluated using multisession using the 12 cores given to each job by the scheduler.

A similar filename format is used for the other types of job schedulers supported. For instance, for Slurm the template file should be named ./batchtools.slurm.tmpl or ~/.batchtools.slurm.tmpl in order for

> plan(batchtools_slurm)

to locate the file automatically. To specify this template file explicitly, use argument template, e.g.

> plan(batchtools_slurm, template = "/path/to/batchtools.slurm.tmpl")

For further details and examples on how to configure batchtools per se, see the batchtools configuration wiki page.

Demos

The future package provides a demo using futures for calculating a set of Mandelbrot planes. The demo does not assume anything about what type of futures are used. The user has full control of how futures are evaluated. For instance, to use local batchtools futures, run the demo as:

library("future.batchtools")
plan(batchtools_local)
demo("mandelbrot", package = "future", ask = FALSE)

Installation

R package future.batchtools is available on CRAN and can be installed in R as:

install.packages("future.batchtools")

Pre-release version

To install the pre-release version that is available in Git branch develop on GitHub, use:

remotes::install_github("HenrikBengtsson/future.batchtools", ref="develop")

This will install the package from source.

Contributing

To contribute to this package, please see CONTRIBUTING.md.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].