All Projects → labdave → CloudConductor

labdave / CloudConductor

Licence: other
CloudConductor is a workflow management system that generates and executes bioinformatics pipelines

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to CloudConductor

pipeline-editor
Cloud Pipelines Editor is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.
Stars: ✭ 22 (+69.23%)
Mutual labels:  pipeline, google-cloud
google maps
🗺 An unofficial Google Maps Platform client library for the Rust programming language.
Stars: ✭ 40 (+207.69%)
Mutual labels:  google-cloud
nemesyst
Generalised and highly customisable, hybrid-parallelism, database based, deep learning framework.
Stars: ✭ 17 (+30.77%)
Mutual labels:  pipeline
pipeline
Pipelines using goroutines
Stars: ✭ 46 (+253.85%)
Mutual labels:  pipeline
spannerz
Google Cloud Spanner Query Planner Visualizer
Stars: ✭ 60 (+361.54%)
Mutual labels:  google-cloud
streamlit-project
This repository provides a simple deployment-ready project layout for a Streamlit app. Simply swap out the code in `app.py` for your own and hit deploy!
Stars: ✭ 33 (+153.85%)
Mutual labels:  google-cloud
auth
A GitHub Action for authenticating to Google Cloud.
Stars: ✭ 567 (+4261.54%)
Mutual labels:  google-cloud
functions-framework-java
FaaS (Function as a service) framework for writing portable Java functions
Stars: ✭ 101 (+676.92%)
Mutual labels:  google-cloud
ember-pipeline
Railway oriented programming in Ember
Stars: ✭ 17 (+30.77%)
Mutual labels:  pipeline
functions
An Open Source Serverless Platform
Stars: ✭ 44 (+238.46%)
Mutual labels:  pipeline
STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA
Forecast stock prices using machine learning approach. A time series analysis. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks
Stars: ✭ 94 (+623.08%)
Mutual labels:  pipeline
gulp-sort
Sort files in stream by path or any custom sort comparator
Stars: ✭ 22 (+69.23%)
Mutual labels:  pipeline
hic
Analysis of Chromosome Conformation Capture data (Hi-C)
Stars: ✭ 45 (+246.15%)
Mutual labels:  pipeline
datajob
Build and deploy a serverless data pipeline on AWS with no effort.
Stars: ✭ 101 (+676.92%)
Mutual labels:  pipeline
rna-seq-snakemake
Snakemake based pipeline for RNA-Seq analysis
Stars: ✭ 29 (+123.08%)
Mutual labels:  pipeline
AnimationDNA
Maya > Arnold > Nuke pipeline
Stars: ✭ 101 (+676.92%)
Mutual labels:  pipeline
perspectiveapi-authorship-demo
Example code to illustrate how to build an authorship experience using the perspective API
Stars: ✭ 62 (+376.92%)
Mutual labels:  google-cloud
XProc-Z
A platform for running XProc pipelines as web applications in a Java servlet container
Stars: ✭ 20 (+53.85%)
Mutual labels:  pipeline
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (+169.23%)
Mutual labels:  pipeline
eidos-audition
Collection of auditory models.
Stars: ✭ 25 (+92.31%)
Mutual labels:  pipeline
CC

CloudConductor: Simplified Bioinformatics

CloudConductor is a cloud-based workflow engine for defining and executing bioinformatics pipelines in a cloud environment. Currently, the framework has been tested extensively on the Google Cloud Platform, but will eventually support other platforms including AWS, Azure, etc.

Feature Highlights

  • User-friendly
    • Define complex workflows by linking together user-defined modules that can be re-used across pipelines
    • Config_obj for clean, readable workflows (see below example)
    • 50+ pre-installed modules for existing bioinformatics tools
  • Portable
    • Docker integration ensures reproducible runtime environment for modules
    • Platform independent (currently supports GCP; AWS, Azure to come)
  • Modular/Extensible
    • Plug-N-Play with user-defined task modules
    • Easily re-use, re-combine across workflows
      • Eliminates serial copy/paste
    • Easily add or customize task modules as needed
  • Pre-Launch Type-Checking
    • Strongly-typed task modules
      • Catch pipeline errors prior to runtime
    • Pre-launch validation ensures pipeline success/failure
  • Scalable
    • Removes resource limitations imposed by cluster-based HPCCs
  • Elastic
    • VM usage automatically scales to match input file sizes, computational needs
  • Scatter-Gather Parallelism
    • In-built logic for dividing large tasks into small chunks and re-combining
  • Economical
    • Preemptible/Spot instances drastically cut workflow costs

Setting up your system

CloudConductor is currently designed only for Linux systems. You will need to install and configure the following tools to run your pipelines on Google Cloud:

  1. Python v3.6+

    You can check your Python version by running the following command in your terminal:

    $ python3 -V
    Python 3.6.8

    To install the correct version of Python, visit the official Python website.

  2. Python packages: configobj, jsonschema, requests

    You will need pip to install the above packages. After installing pip, run the following commands in your terminal:

    # Upgrade pip
    sudo pip3 install -U pip
    
    # Install Python modules
    sudo pip3 install -U configobj jsonschema requests
  3. Clone the CloudConductor repo

    # clone the repo
    git clone https://github.com/labdave/CloudConductor.git
  4. Google Cloud Platform SDK

    Follow the instructions on the official Google Cloud website.

Documentation

Get started with our full documentation to explore the ways CloudConductor can streamline the development and execution of complex, multi-sample workflows typical in bioinformatics.

Project Status

CloudConductor is actively under development. To get involved or request features, please contact Razvan Panea.

Authors & Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].