Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ewels → Clusterflow

ewels / Clusterflow

Licence: gpl-3.0

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.

Programming Languages

6916 projects

Labels

bioinformatics pipeline

Projects that are alternatives of or similar to Clusterflow

Rnaseq Workflow

A repository for setting up a RNAseq workflow

Stars: ✭ 170 (+100%)

Mutual labels: bioinformatics, pipeline

A library to build and execute typed scientific workflows

Stars: ✭ 43 (-49.41%)

Mutual labels: bioinformatics, pipeline

🔬 BEDOPS: high-performance genomic feature operations

Stars: ✭ 215 (+152.94%)

Mutual labels: bioinformatics, pipeline

A DSL for data-driven computational pipelines

Stars: ✭ 1,337 (+1472.94%)

Mutual labels: bioinformatics, pipeline

Robust, flexible and resource-efficient pipelines using Go and the commandline

Stars: ✭ 826 (+871.76%)

Mutual labels: bioinformatics, pipeline

Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing

Stars: ✭ 124 (+45.88%)

Mutual labels: bioinformatics, pipeline

This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion

Stars: ✭ 21 (-75.29%)

Mutual labels: bioinformatics, pipeline

UGENE is free open-source cross-platform bioinformatics software

Stars: ✭ 112 (+31.76%)

Mutual labels: bioinformatics, pipeline

Data intensive science for everyone.

Stars: ✭ 812 (+855.29%)

Mutual labels: bioinformatics, pipeline

A cross-platform command-line tool for executing jobs in parallel

Stars: ✭ 421 (+395.29%)

Mutual labels: bioinformatics, pipeline

sc2-illumina-pipeline

Bioinformatics pipeline for SARS-CoV-2 sequencing at CZ Biohub

Stars: ✭ 18 (-78.82%)

Mutual labels: bioinformatics, pipeline

Analysis pipelines for sequencing data

Stars: ✭ 43 (-49.41%)

Mutual labels: bioinformatics, pipeline

A simplified pipeline for ctDNA sequencing data analysis

Stars: ✭ 29 (-65.88%)

Mutual labels: bioinformatics, pipeline

DEPRECIATED! Please use nf-core/tools instead

Stars: ✭ 18 (-78.82%)

Mutual labels: bioinformatics, pipeline

Robust and efficient workflows using a simple language agnostic approach

Stars: ✭ 73 (-14.12%)

Mutual labels: bioinformatics, pipeline

Machine is a workflow/pipeline library for processing data

Stars: ✭ 78 (-8.24%)

Mutual labels: pipeline

Awesome 10x Genomics

List of tools and resources related to the 10x Genomics GEMCode/Chromium system

Stars: ✭ 82 (-3.53%)

Mutual labels: bioinformatics

Biosequences.jl

Biological sequences for the julia language

Stars: ✭ 77 (-9.41%)

Mutual labels: bioinformatics

A fast whole-genome aligner based on de Bruijn graphs

Stars: ✭ 76 (-10.59%)

Mutual labels: bioinformatics

Structural variant toolkit for VCFs

Stars: ✭ 85 (+0%)

Mutual labels: bioinformatics

View All Similar Projects ➔

A user-friendly bioinformatics workflow tool

Find Cluster Flow documentation with information and examples at http://clusterflow.io

Cluster Flow is a pipelining tool to automate and standardise bioinformatics analyses on high-performance cluster environments. It is designed to be easy to use, quick to set up and flexible to configure.

Cluster Flow is written in Perl and works by launching jobs to a cluster (can also be run locally). Each job is a stand-alone Perl executable wrapper around a bioinformatics tool of interest.

Modules collect extensive logging information and Cluster Flow e-mails the user with a summary of the pipeline commands and exit codes upon completion.

Installation

You can find stable versions to download on the releases page.

You can get the development version of the code by cloning this repository:

git clone https://github.com/ewels/clusterflow.git

Once downloaded and extracted, create a clusterflow.config file in the script directory, based on clusterflow.config.example.

Next, you need to add the main cf executable to your PATH. This can be done as an environment module, with a symlink to bin or by adding to your ~/.bashrc file.

Finally, run the setup wizard (cf --setup) and genomes wizard (cf --add_genome) and you're ready to go! See the installation docs for more information.

Usage

Pipelines are launched by naming a pipeline or module and the input files. A simple example could look like this:

cf sra_trim *.fastq.gz

Most pipelines need reference genomes, and Cluster Flow has built in reference genome management. Parameters can be passed to modify tool behaviour.

For example, to run the fastq_bowtie pipeline (FastQC, TrimGalore! and Bowtie) with Human data, trimming the first 6bp of read 1, the command would be:

cf --genome GRCh37 --params "clip_r1=6" fastq_bowtie *.fastq.gz

Additional common Cluster Flow commands are as follows:

cf --genomes     # List available reference genomes
cf --pipelines   # List available pipelines
cf --modules     # List available modules
cf --qstat       # List running pipelines
cf --qdel [id]   # Cancel jobs for a running pipeline

Supported Tools

Cluster Flow comes with modules and pipelines for the following tools:

Read QC & pre-processing	Aligners / quantifiers	Post-alignment processing	Post-alignment QC
FastQ Screen	Bismark	bedtools (`bamToBed`, `intersectNeg`)	deepTools (`bamCoverage`, `bamFingerprint`)
FastQC	Bowtie 1	subread featureCounts	MultiQC
TrimGalore!	Bowtie 2	HTSeq Count	phantompeaktools (`runSpp`)
SRA Toolkit	BWA	Picard (`MarkDuplicates`)	Preseq
	HiCUP	Samtools (`bam2sam`, `dedup`, `sort_index`)	RSeQC (`geneBody_coverage`, `inner_distance`, `junction_annotation`, `junction_saturation`, `read_GC`)
	HISAT2
	Kallisto
	STAR
	TopHat

Citation

Please consider citing Cluster Flow if you use it in your analysis.

Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].
Philip Ewels, Felix Krueger, Max Käller, Simon Andrews
F1000Research 2016, 5:2824
doi: 10.12688/f1000research.10335.2

@article{Ewels2016,
author = {Ewels, Philip and Krueger, Felix and K{\"{a}}ller, Max and Andrews, Simon},
title = {Cluster Flow: A user-friendly bioinformatics workflow tool [version 2; referees: 3 approved].},
journal = {F1000Research},
volume = {5},
pages = {2824},
year = {2016},
doi = {10.12688/f1000research.10335.2},
URL = { + http://dx.doi.org/10.12688/f1000research.10335.2}
}

Contributions & Support

Contributions and suggestions for new features are welcome, as are bug reports! Please create a new issue. Cluster Flow has extensive documentation describing how to write new modules and pipelines.

There is a chat room for the package hosted on Gitter where you can discuss things with the package author and other developers: https://gitter.im/ewels/clusterflow

If in doubt, feel free to get in touch with the author directly: @ewels ([email protected])

Contributors

Project lead and main author: @ewels

Code contributions from: @s-andrews, @FelixKrueger, @stu2, @orzechoj @darogan and others. Thanks for your support!

License

Cluster Flow is released with a GPL v3 licence. Cluster Flow is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. For more information, see the licence that comes bundled with Cluster Flow.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 85

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (11) 🔗