Alternatives and detailed information of awflow

JoeriHermans / awflow

Licence: BSD-3-Clause license

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your personal computer!

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to awflow

Nextflow

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+8813.33%)

Mutual labels: hpc, reproducible-research, workflow-engine, reproducible-science

Jug

Parallel programming with Python

Stars: ✭ 337 (+2146.67%)

Mutual labels: hpc, workflow-engine

ck-env

CK repository with components and automation actions to enable portable workflows across diverse platforms including Linux, Windows, MacOS and Android. It includes software detection plugins and meta packages (code, data sets, models, scripts, etc) with the possibility of multiple versions to co-exist in a user or system environment:

Stars: ✭ 67 (+346.67%)

Mutual labels: hpc, reproducible-research

Singularity

Singularity: Application containers for Linux

Stars: ✭ 2,290 (+15166.67%)

Mutual labels: hpc, reproducible-science

showyourwork

Fully reproducible, open source scientific articles in LaTeX.

Stars: ✭ 361 (+2306.67%)

Mutual labels: reproducible-research, reproducible-science

omnia

An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.

Stars: ✭ 128 (+753.33%)

Mutual labels: hpc, slurm

ngs-preprocess

A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

Stars: ✭ 22 (+46.67%)

Mutual labels: reproducible-research, reproducible-science

Liftr

🐳 Containerize R Markdown documents for continuous reproducibility

Stars: ✭ 155 (+933.33%)

Mutual labels: reproducible-research, reproducible-science

software-dev

Coding Standards for the USC Biostats group

Stars: ✭ 33 (+120%)

Mutual labels: hpc, reproducible-research

slurmR

slurmR: A Lightweight Wrapper for Slurm

Stars: ✭ 43 (+186.67%)

Mutual labels: hpc, slurm

HPC

A collection of various resources, examples, and executables for the general NREL HPC user community's benefit. Use the following website for accessing documentation.

Stars: ✭ 64 (+326.67%)

Mutual labels: hpc, slurm

Containerit

Package an R workspace and all dependencies as a Docker container

Stars: ✭ 248 (+1553.33%)

Mutual labels: reproducible-research, reproducible-science

Reprozip

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

Stars: ✭ 231 (+1440%)

Mutual labels: reproducible-research, reproducible-science

future.batchtools

🚀 R package future.batchtools: A Future API for Parallel and Distributed Processing using batchtools

Stars: ✭ 77 (+413.33%)

Mutual labels: hpc, slurm

Fglab

Future Gadget Laboratory

Stars: ✭ 218 (+1353.33%)

Mutual labels: reproducible-research, reproducible-science

ukbrest

ukbREST: efficient and streamlined data access for reproducible research of large biobanks

Stars: ✭ 32 (+113.33%)

Mutual labels: reproducible-research, reproducible-science

Everware

Everware is about re-useable science, it allows people to jump right in to your research code.

Stars: ✭ 112 (+646.67%)

Mutual labels: reproducible-research, reproducible-science

Sarek

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing

Stars: ✭ 124 (+726.67%)

Mutual labels: reproducible-research, reproducible-science

launcher-scripts

(DEPRECATED) A set of launcher scripts to be used with OAR and Slurm for running jobs on the UL HPC platform

Stars: ✭ 14 (-6.67%)

Mutual labels: hpc, slurm

openscience

Empirical Software Engineering journal (EMSE) open science and reproducible research initiative

Stars: ✭ 28 (+86.67%)

Mutual labels: reproducible-research, reproducible-science

View All Similar Projects ➔

Reproducible research and reusable acyclic workflows in Python. Execute code on HPC systems as if you executed them on your machine!

Motivation

Would you like fully reproducible research or reusable workflows that seamlessly run on HPC clusters? Tired of writing and managing large Slurm submission scripts? Do you have comment out large parts of your pipeline whenever its results have been generated? Hate YAML? Don't waste your precious time! awflow allows you to directly describe complex pipelines in Python, that run on your personal computer and large HPC clusters.

import glob
import numpy as np
import os

from awflow import after, ensure, job, schedule

n = 10000
tasks = 10

@ensure(lambda i: os.path.exists(f'pi-{i}.npy'))
@job(cpus='4', memory='4GB', array=tasks)
def estimate(i: int):
    print(f'Executing task {i + 1} / {tasks}.')
    x = np.random.random(n)
    y = np.random.random(n)
    pi_estimate = (x**2 + y**2 <= 1)
    np.save(f'pi-{i}.npy', pi_estimate)

@after(estimate)
@ensure(lambda: os.path.exists('pi.npy'))
@job(cpus='4')
def merge():
    files = glob.glob('pi-*.npy')
    stack = np.vstack([np.load(f) for f in files])
    pi_estimate = stack.sum() / (n * tasks) * 4
    print('π ≅', pi_estimate)
    np.save('pi.npy', pi_estimate)

merge.prune()  # Prune jobs whose postconditions have been satisfied

schedule(merge, backend='local')  # Executes merge and its dependencies

Executing this Python program (python examples/pi.py --backend slurm) on a Slurm HPC cluster will launch the following jobs.

           1803299       all    merge username PD       0:00      1 (Dependency)
     1803298_[6-9]       all estimate username PD       0:00      1 (Resources)
         1803298_3       all estimate username  R       0:01      1 compute-xx
         1803298_4       all estimate username  R       0:01      1 compute-xx
         1803298_5       all estimate username  R       0:01      1 compute-xx

The following example shows how workflow graphs can be dynamically allocated:

from awflow import after, job, schedule, terminal_nodes

@job(cpus='2', memory='4GB', array=5)
def generate(i: int):
    print(f'Generating data block {i}.')

@after(generate)
@job(cpus='1', memory='2GB', array=5)
def postprocess(i: int):
    print(f'Postprocessing data block {i}.')

def do_experiment(parameter):
    r"""This method allocates a `fit` and `make_plot` job
    based on the specified parameter."""

    @after(postprocess)
    @job(name=f'fit_{parameter}')  # By default, the name is equal to the function name
    def fit():
        print(f'Fit {parameter}.')

    @after(fit)
    @job(name=f'plt_{parameter}')  # Simplifies the identification of the logfile
    def make_plot():
        print(f'Plot {parameter}.')

# Programmatically build workflow
for parameter in [0.1, 0.2, 0.3, 0.4, 0.5]:
    do_experiment(parameter)

leafs = terminal_nodes(generate, prune=True)  # Find terminal nodes of workflow graph
schedule(*leafs, backend='local')

Check the examples directory to explore the functionality.

Available backends

Currently, awflow.schedule only supports a local and slurm backend.

Installation

The awflow package is available on PyPi, which means it is installable via pip.

you@local:~ $ pip install awflow

If you would like the latest features, you can install it using this Git repository.

you@local:~ $ pip install git+https://github.com/JoeriHermans/awflow

If you would like to run the examples as well, be sure to install the optional example dependencies.

you@local:~ $ pip install 'awflow[examples]'

License

As described in the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

JoeriHermans / awflow

Programming Languages

Labels

Projects that are alternatives of or similar to awflow

Motivation

Available backends

Installation

License