All Projects → otiai10 → hotsub

otiai10 / hotsub

Licence: GPL-3.0 license
Command line tool to run batch jobs concurrently with ETL framework on AWS or other cloud computing resources

Programming Languages

go
31211 projects - #10 most used programming language
shell
77523 projects
Dockerfile
14818 projects
TeX
3793 projects
wdl
31 projects

Projects that are alternatives of or similar to hotsub

tibanna
Tibanna helps you run your genomic pipelines on Amazon cloud (AWS). It is used by the 4DN DCIC (4D Nucleome Data Coordination and Integration Center) to process data. Tibanna supports CWL/WDL (w/ docker), Snakemake (w/ conda) and custom Docker/shell command.
Stars: ✭ 61 (+110.34%)
Mutual labels:  bioinformatics, cwl, wdl-workflow, cwl-workflow
workflows
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
Stars: ✭ 16 (-44.83%)
Mutual labels:  workflow-engine, cwl, wdl-workflow, cwl-workflow
Arvados
An open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+844.83%)
Mutual labels:  bioinformatics, workflow-engine, gcp
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+2700%)
Mutual labels:  bioinformatics, workflow-engine
Scipipe
Robust, flexible and resource-efficient pipelines using Go and the commandline
Stars: ✭ 826 (+2748.28%)
Mutual labels:  bioinformatics, workflow-engine
Globalbioticinteractions
Global Biotic Interactions provides access to existing species interaction datasets
Stars: ✭ 71 (+144.83%)
Mutual labels:  bioinformatics, etl-framework
Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (+227.59%)
Mutual labels:  bioinformatics, gcp
wdl2cwl
[Experimental] Workflow Definition Language (WDL) to CWL
Stars: ✭ 26 (-10.34%)
Mutual labels:  cwl, wdl-workflow
Cuneiform
Cuneiform distributed programming language
Stars: ✭ 175 (+503.45%)
Mutual labels:  bioinformatics, workflow-engine
Nextflow
A DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+4510.34%)
Mutual labels:  bioinformatics, workflow-engine
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+31.03%)
Mutual labels:  gcp, etl-framework
TeamTeri
Genomics using open source tools, running on GCP or AWS
Stars: ✭ 30 (+3.45%)
Mutual labels:  bioinformatics, gcp
awesome-phages
A curated list of phage related software and computational resources for phage scientists, bioinformaticians and enthusiasts.
Stars: ✭ 14 (-51.72%)
Mutual labels:  bioinformatics
gcf-packs
Library packs for google cloud functions
Stars: ✭ 48 (+65.52%)
Mutual labels:  gcp
dysgu
dysgu-SV is a collection of tools for calling structural variants using short or long reads
Stars: ✭ 47 (+62.07%)
Mutual labels:  bioinformatics
protwis
Protwis is the backbone of the GPCRdb. The GPCRdb contains reference data, interactive visualisation and experiment design tools for G protein-coupled receptors (GPCRs).
Stars: ✭ 20 (-31.03%)
Mutual labels:  bioinformatics
dtm
A distributed transaction framework that supports multiple languages, supports saga, tcc, xa, 2-phase message, outbox patterns.
Stars: ✭ 6,110 (+20968.97%)
Mutual labels:  workflow-engine
paccmann datasets
pytoda - PaccMann PyTorch Dataset Classes. Read the docs: https://paccmann.github.io/paccmann_datasets/
Stars: ✭ 15 (-48.28%)
Mutual labels:  bioinformatics
ETL-Starter-Kit
📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-27.59%)
Mutual labels:  etl-framework
BridgeDb
The BridgeDb Library source code
Stars: ✭ 22 (-24.14%)
Mutual labels:  bioinformatics

hotsub Build Status Paper Status

The simple batch job driver on AWS and GCP. (Azure, OpenStack are coming soon)

hotsub run \
  --script ./star-alignment.sh \
  --tasks ./star-alignment-tasks.csv \
  --image friend1ws/star-alignment \
  --aws-ec2-instance-type t2.2xlarge \
  --verbose

It will

  • execute workflow described in star-alignment.sh
  • for each samples specified in star-alignment.csv
  • in friend1ws/star-alignment docker containers
  • on EC2 instances of type t2.2xlarge

and automatically upload the output files to S3 and clean up EC2 instances after all.

See Documentation for more details.

Why you use hotsub

There are 3 points why hotsub is made and why you use it

  1. No-need to setup your cloud on web consoles:
    • Since hotsub uses pure EC2 or GCE instances, you don't have to configure AWS Batch nor Dataflow on messy web consoles
  2. Multi-platforms with the same interface of command line:
    • You can switch AWS and GCP as you like only with --provider option of run command (of course you need to have credentials on your local machine)
  3. ExTL framework available:
    • In some cases of bio-informatics, the problem is how to handle common and huge refrence genome. hotsub suggests and implements ExTL framework.

Installation

Check Getting Started on GitHub Pages

Commands

NAME:
   hotsub - command line to run batch computing on AWS and GCP with the same interface

USAGE:
   hotsub [global options] command [command options] [arguments...]

VERSION:
   0.10.0

DESCRIPTION:
   Open-source command-line tool to run batch computing tasks and workflows on backend services such as Amazon Web Services.

COMMANDS:
     run       Run your jobs on cloud with specified input files and any parameters
     init      Initialize CLI environment on which hotsub runs
     template  Create a template project of hotsub
     help, h   Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help
   --version, -V  print the version

Available options for run command

% hotsub run -h
NAME:
   hotsub run - Run your jobs on cloud with specified input files and any parameters

USAGE:
   hotsub run [command options] [arguments...]

DESCRIPTION:
   Run your jobs on cloud with specified input files and any parameters

OPTIONS:
   --verbose, -v                     Print verbose log for operation.
   --log-dir value                   Path to log directory where stdout/stderr log files will be placed (default: "${cwd}/logs/${time}")
   --concurrency value, -C value     Throttle concurrency number for running jobs (default: 8)
   --provider value, -p value        Job service provider, either of [aws, gcp, vbox, hyperv] (default: "aws")
   --tasks value                     Path to CSV of task parameters, expected to specify --env, --input, --input-recursive and --output-recursive. (required)
   --image value                     Image name from Docker Hub or other Docker image service. (default: "ubuntu:14.04")
   --script value                    Local path to a script to run inside the workflow Docker container. (required)
   --shared value, -S value          Shared data URL on cloud storage bucket. (e.g. s3://~)
   --keep                            Keep instances created for computing event after everything gets done
   --env value, -E value             Environment variables to pass to all the workflow containers
   --disk-size value                 Size of data disk to attach for each job in GB. (default: 64)
   --shareddata-disksize value       Disk size of shared data instance (in GB) (default: 64)
   --aws-region value                AWS region name in which AmazonEC2 instances would be launched (default: "ap-northeast-1")
   --aws-ec2-instance-type value     AWS EC2 instance type. If specified, all --min-cores and --min-ram would be ignored. (default: "t2.micro")
   --aws-shared-instance-type value  Shared Instance Type on AWS (default: "m4.4xlarge")
   --aws-vpc-id value                VPC ID on which computing VMs are launched
   --aws-subnet-id value             Subnet ID in which computing VMs are launched
   --google-project value            Project ID for GCP
   --google-zone value               GCP service zone name (default: "asia-northeast1-a")
   --cwl value                       CWL file to run your workflow
   --cwl-job value                   Parameter files for CWL
   --wdl value                       WDL file to run your workflow
   --wdl-job value                   Parameter files for WDL
   --include value                   Local files to be included onto workflow container

Contact

To make it transparent, ask any question from this link.

https://github.com/otiai10/hotsub/issues

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].