All Projects → seandavi → wdlRunR

seandavi / wdlRunR

Licence: other
Elastic, reproducible, and reusable genomic data science tools from R backed by cloud resources

Programming Languages

r
7636 projects
wdl
31 projects
TeX
3793 projects
Makefile
30231 projects

Projects that are alternatives of or similar to wdlRunR

GenomicDataCommons
Provide R access to the NCI Genomic Data Commons portal.
Stars: ✭ 64 (+88.24%)
Mutual labels:  bioinformatics, genomics, bioconductor
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+8535.29%)
Mutual labels:  bioinformatics, genomics
Cyvcf2
cython + htslib == fast VCF and BCF processing
Stars: ✭ 243 (+614.71%)
Mutual labels:  bioinformatics, genomics
MultiAssayExperiment
Bioconductor package for management of multi-assay data
Stars: ✭ 57 (+67.65%)
Mutual labels:  genomics, bioconductor
Miniasm
Ultrafast de novo assembly for long noisy reads (though having no consensus step)
Stars: ✭ 216 (+535.29%)
Mutual labels:  bioinformatics, genomics
Bowtie
An ultrafast memory-efficient short read aligner
Stars: ✭ 221 (+550%)
Mutual labels:  bioinformatics, genomics
Hap.py
Haplotype VCF comparison tools
Stars: ✭ 249 (+632.35%)
Mutual labels:  bioinformatics, genomics
Intermine
A powerful open source data warehouse system
Stars: ✭ 195 (+473.53%)
Mutual labels:  bioinformatics, genomics
jgi-query
A simple command-line tool to download data from Joint Genome Institute databases
Stars: ✭ 38 (+11.76%)
Mutual labels:  bioinformatics, genomics
faster lmm d
A faster lmm for GWAS. Supports GPU backend.
Stars: ✭ 12 (-64.71%)
Mutual labels:  bioinformatics, genomics
Scaff10X
Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
Stars: ✭ 21 (-38.24%)
Mutual labels:  bioinformatics, genomics
Bedops
🔬 BEDOPS: high-performance genomic feature operations
Stars: ✭ 215 (+532.35%)
Mutual labels:  bioinformatics, genomics
Minigraph
Proof-of-concept seq-to-graph mapper and graph generator
Stars: ✭ 206 (+505.88%)
Mutual labels:  bioinformatics, genomics
Dash
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Stars: ✭ 15,592 (+45758.82%)
Mutual labels:  bioinformatics, rstats
Sequenceserver
Intuitive local web frontend for the BLAST bioinformatics tool
Stars: ✭ 198 (+482.35%)
Mutual labels:  bioinformatics, genomics
Canvasxpress
JavaScript VisualizationTools
Stars: ✭ 247 (+626.47%)
Mutual labels:  bioinformatics, genomics
unimap
A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment
Stars: ✭ 76 (+123.53%)
Mutual labels:  bioinformatics, genomics
Karyoploter
karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
Stars: ✭ 192 (+464.71%)
Mutual labels:  bioinformatics, rstats
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+6970.59%)
Mutual labels:  bioinformatics, genomics
workflows
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
Stars: ✭ 16 (-52.94%)
Mutual labels:  genomics, cromwell

The wdlRunR package -- DEPRECATED AND NO LONGER MAINTAINED

Follow development at github.

This package executes Workflow Description Language (WDL) files from within R. Compute platforms currently supported by the Broad cromwell workflow engine include:

  • Local execution (good for testing)
  • Sun GridEngine Clusters (and probably other HPC schedulers)
  • HTCondor
  • Google Compute Engine
  • Apache Spark

Install

require(devtools)
devtools::install_github('seandavi/wdlRunR')

Features

This package leverages all the typical data munging and analysis capabilities of R and Bioconductor, but adds the ability to orchestrate nearly arbitrarily large and complex workflows described using WDL (that are portable and written outside of this package).

Features of this package include:

  • With appropriate backend (Google, for example), scale to huge computational capacity
  • Submit single or batches of jobs
  • Monitor jobs
  • Retrieve metadata from submitted, completed, and running jobs
  • Review log files from completed and failed jobs
  • Track inputs and outputs of jobs
  • Optional "caching" of jobs to avoid costly recomputation costs

Working with AWS

Make a custom AMI with cromwell additions

python create-genomics-ami.py \
       --user-data cromwell-genomics-ami.cloud-init.yaml \
       --key-pair-name EveryDay \
       --scratch-mount-point /cromwell_root \
       --profile default \
       --ami-description "AMI for use with Cromwell"

TODO: Do this with packer....

Set up Cromwell config file

// aws.conf
include required(classpath("application"))

aws {
  application-name = "cromwell"
  auths = [{
      name = "default"
      scheme = "default"
  }]
  #
  # be sure to set this!!
  #
  region = "us-east-1"
}

engine {
  filesystems {
    s3 { auth = "default" }
  }
}

backend {
  default = "AWSBATCH"
  providers {
    AWSBATCH {
      actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
      config {
	    #
		# Change this to an EXISTING bucket
		# Cromwell does not create the bucket for you
		#
        root = "s3://<your-s3-bucket-name>/cromwell-execution"
        auth = "default"

        numSubmitAttempts = 3
        numCreateDefinitionAttempts = 3

        concurrent-job-limit = 16

        default-runtime-attributes {
		  #
		  # You need to set up your AWS batch
		  # queues and compute environments. 
		  # Then, paste in the Queue ARN, 
		  # available from the AWS batch console
		  # under the queue details
		  # 
          queueArn: "<your-queue-arn>"
        }

        filesystems {
          s3 {
            auth = "default"
          }
        }
      }
    }
  }
}

Testing cromwell

curl -X POST --header "Accept: application/json" \
     "localhost:8000/api/workflows/v1" \
     -F workflowSource=@get_ebi_fastq.wdl \
     -F workflowInputs=@get_ebi_fastq.inputs
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].