Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → pegasus-isi → Pegasus

pegasus-isi / Pegasus

Licence: apache-2.0

Pegasus Workflow Management System - Automate, recover, and debug scientific computations.

Programming Languages

java

68154 projects - #9 most used programming language

Labels

workflow distributed-systems bioinformatics hpc

Projects that are alternatives of or similar to Pegasus

Cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

Stars: ✭ 655 (+495.45%)

Mutual labels: bioinformatics, workflow, hpc

Galaxy

Data intensive science for everyone.

Stars: ✭ 812 (+638.18%)

Mutual labels: bioinformatics, workflow

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (+615.45%)

Mutual labels: distributed-systems, workflow

Sv Callers

Snakemake-based workflow for detecting structural variants in WGS data

Stars: ✭ 28 (-74.55%)

Mutual labels: bioinformatics, workflow

Jug

Parallel programming with Python

Stars: ✭ 337 (+206.36%)

Mutual labels: workflow, hpc

Wdl

Workflow Description Language - Specification and Implementations

Stars: ✭ 438 (+298.18%)

Mutual labels: bioinformatics, workflow

Cookiecutter

DEPRECIATED! Please use nf-core/tools instead

Stars: ✭ 18 (-83.64%)

Mutual labels: bioinformatics, workflow

Cuneiform

Cuneiform distributed programming language

Stars: ✭ 175 (+59.09%)

Mutual labels: bioinformatics, workflow

Xene

A distributed workflow runner focusing on performance and simplicity.

Stars: ✭ 56 (-49.09%)

Mutual labels: distributed-systems, workflow

Cwl Svg

A library for generating an interactive SVG visualization of CWL workflows

Stars: ✭ 57 (-48.18%)

Mutual labels: bioinformatics, workflow

Maestrowf

A tool to easily orchestrate general computational workflows both locally and on supercomputers

Stars: ✭ 72 (-34.55%)

Mutual labels: workflow, hpc

Arvados

An open source platform for managing and analyzing biomedical big data

Stars: ✭ 274 (+149.09%)

Mutual labels: bioinformatics, workflow

sapporo

A standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification and a web application for managing and executing those WES services.

Stars: ✭ 17 (-84.55%)

Mutual labels: workflow, bioinformatics

Nextflow

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+1115.45%)

Mutual labels: bioinformatics, hpc

bistro

A library to build and execute typed scientific workflows

Stars: ✭ 43 (-60.91%)

Mutual labels: workflow, bioinformatics

Scipipe

Robust, flexible and resource-efficient pipelines using Go and the commandline

Stars: ✭ 826 (+650.91%)

Mutual labels: bioinformatics, workflow

Sarek

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing

Stars: ✭ 124 (+12.73%)

Mutual labels: bioinformatics, workflow

Rnaseq Workflow

A repository for setting up a RNAseq workflow

Stars: ✭ 170 (+54.55%)

Mutual labels: bioinformatics, workflow

Wfl

A Simple Way of Creating Job Workflows in Go running in Processes, Containers, Tasks, Pods, or Jobs

Stars: ✭ 30 (-72.73%)

Mutual labels: workflow, hpc

Flowr

Robust and efficient workflows using a simple language agnostic approach

Stars: ✭ 73 (-33.64%)

Mutual labels: bioinformatics, workflow

View All Similar Projects ➔

Pegasus Workflow Management System

Pegasus WMS is a configurable system for mapping and executing scientific workflows over a wide range of computational infrastructures including laptops, campus clusters, supercomputers, grids, and commercial and academic clouds. Pegasus has been used to run workflows with up to 1 million tasks that process tens of terabytes of data at a time.

Pegasus WMS bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It automatically locates the necessary input data and computational resources required by a workflow, and plans out all of the required data transfer and job submission operations required to execute the workflow. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, Amazon EC2, etc.). In the process, Pegasus can plan and optimize the workflow to enable efficient, high-performance execution of large workflows on complex, distributed infrastructures.

Pegasus has a number of features that contribute to its usability and effectiveness:

Portability / Reuse – User created workflows can easily be run in different environments without alteration. Pegasus currently runs workflows on top of Condor pools, Grid infrastructures such as Open Science Grid and XSEDE, Amazon EC2, Google Cloud, and HPC clusters. The same workflow can run on a single system or across a heterogeneous set of resources.
Performance – The Pegasus mapper can reorder, group, and prioritize tasks in order to increase overall workflow performance.
Scalability – Pegasus can easily scale both the size of the workflow, and the resources that the workflow is distributed over. Pegasus runs workflows ranging from just a few computational tasks up to 1 million. The number of resources involved in executing a workflow can scale as needed without any impediments to performance.
Provenance – By default, all jobs in Pegasus are launched using the Kickstart wrapper that captures runtime provenance of the job and helps in debugging. Provenance data is collected in a database, and the data can be queried with tools such as pegasus-statistics, pegasus-plots, or directly using SQL.
Data Management – Pegasus handles replica selection, data transfers and output registration in data catalogs. These tasks are added to a workflow as auxilliary jobs by the Pegasus planner.
Reliability – Jobs and data transfers are automatically retried in case of failures. Debugging tools such as pegasus-analyzer help the user to debug the workflow in case of non-recoverable failures.
Error Recovery – When errors occur, Pegasus tries to recover when possible by retrying tasks, by retrying the entire workflow, by providing workflow-level checkpointing, by re-mapping portions of the workflow, by trying alternative data sources for staging data, and, when all else fails, by providing a rescue workflow containing a description of only the work that remains to be done. It cleans up storage as the workflow is executed so that data-intensive workflows have enough space to execute on storage-constrained resources. Pegasus keeps track of what has been done (provenance) including the locations of data used and produced, and which software was used with which parameters.

Getting Started

You can find more information about Pegasus on the Pegasus Website.

Pegasus has an extensive User Guide that documents how to create, plan, and monitor workflows.

We recommend you start by completing the Pegasus Tutorial from Chapter 3 of the Pegasus User Guide.

The easiest way to install Pegasus is to use one of the binary packages available on the Pegasus downloads page. Consult Chapter 2 of the Pegasus User Guide for more information about installing Pegasus from binary packages.

There is documentation on the Pegasus website for the Python, Java and R Abstract Workflow Generator APIs. We strongly recommend using the Python API which is feature complete, and also allows you to invoke all the pegasus command line tools.

You can use pegasus-init command line tool to run several examples on your local machine. Consult Chapter 4 of the Pegasus User Guide for more information.

There are also examples of how to Configure Pegasus for Different Execution Environments in the Pegasus User Guide.

If you need help using Pegasus, please contact us. See the [contact page] (http://pegasus.isi.edu/contact) on the Pegasus website for more information.

Building from Source

Pegasus can be compiled on any recent Linux or Mac OS X system.

Source Dependencies

In order to build Pegasus from source, make sure you have the following installed:

Git
Java 8 or higher
Python 3.5 or higher
R
Ant
gcc
g++
make
tox 3.14.5 or higher
mysql (optional, required to access MySQL databases)
postgresql (optional, required to access PostgreSQL databases)
Python pyyaml
Python GitPython

Other packages may be required to run unit tests, and build MPI tools.

Compiling

Ant is used to compile Pegasus.

To get a list of build targets run:

$ ant -p

The targets that begin with "dist" are what you want to use.

To build a basic binary tarball (excluding documentation), run:

$ ant dist

To build the release tarball (including documentation), run:

$ ant dist-release

The resulting packages will be created in the dist subdirectory.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 110

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗