All Projects → shenwei356 → easy_qsub

shenwei356 / easy_qsub

Licence: MIT license
Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to easy qsub

Escalator
Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Stars: ✭ 539 (+1973.08%)
Mutual labels:  cluster, batch
Jaas
Run jobs (tasks/one-shot containers) with Docker
Stars: ✭ 291 (+1019.23%)
Mutual labels:  cluster, batch
Clustering Algorithms from Scratch
Implementing Clustering Algorithms from scratch in MATLAB and Python
Stars: ✭ 170 (+553.85%)
Mutual labels:  cluster
manager
The API endpoint that manages nebula orchestrator clusters
Stars: ✭ 28 (+7.69%)
Mutual labels:  cluster
mongoose-plugin-cache
The Perfect Marriage of MongoDB and Redis
Stars: ✭ 42 (+61.54%)
Mutual labels:  batch
cv4pve-api-php
Proxmox VE Client API for PHP
Stars: ✭ 45 (+73.08%)
Mutual labels:  cluster
Windows-10-tweaks
This repo contains multiple scripts to optimize windows 10
Stars: ✭ 37 (+42.31%)
Mutual labels:  batch
humpback-center
Humpback Center 主要为 Humpback 平台提供集群容器调度服务,以集群中心角色实现各个 Group 的容器分配管理。
Stars: ✭ 37 (+42.31%)
Mutual labels:  cluster
skmeans
Super fast simple k-means implementation for unidimiensional and multidimensional data.
Stars: ✭ 59 (+126.92%)
Mutual labels:  cluster
quickstart-microsoft-sql
AWS Quick Start Team
Stars: ✭ 60 (+130.77%)
Mutual labels:  cluster
laniakea
Laniakea is a utility for managing instances at various cloud providers and aids in setting up a fuzzing cluster.
Stars: ✭ 28 (+7.69%)
Mutual labels:  cluster
kube-watch
Simple tool to get webhooks on Kubernetes cluster events
Stars: ✭ 21 (-19.23%)
Mutual labels:  cluster
xToBatConverter
Generate a ms batch file and inject a files inside of it. When the batch is executed, the files are extracted and executed.
Stars: ✭ 17 (-34.62%)
Mutual labels:  batch
common
Metarhia Common Library
Stars: ✭ 55 (+111.54%)
Mutual labels:  cluster
hivemq4-docker-images
Official Docker Images for the Enterprise MQTT Broker HiveMQ
Stars: ✭ 18 (-30.77%)
Mutual labels:  cluster
terraform-scheduled-batch-job
A Terraform module representing a scheduled Batch job
Stars: ✭ 22 (-15.38%)
Mutual labels:  batch
ansible-role-pacemaker
Ansible role to deploy Pacemaker HA clusters
Stars: ✭ 19 (-26.92%)
Mutual labels:  cluster
ocr2text
Convert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+246.15%)
Mutual labels:  batch
kubernetes-basico
Demonstração dos componentes do Kubernetes
Stars: ✭ 26 (+0%)
Mutual labels:  cluster
simplenetes
The sns tool is used to manage the full life cycle of your Simplenetes clusters. It integrates with the Simplenetes Podcompiler project podc to compile pods.
Stars: ✭ 731 (+2711.54%)
Mutual labels:  cluster

easy_qsub

Easily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.

Submitting PBS jobs

easy_qsub submits PBS jobs with script template, avoid repeatedly editing PBS scripts.

Default template (~/.easy_qsub/default.pbs):

#PBS -S /bin/bash
#PBS -N $name
#PBS -q $queue
#PBS -l ncpus=$ncpus
#PBS -l mem=$mem
#PBS -l walltime=$walltime
#PBS -V

cd $$PBS_O_WORKDIR
echo run on node: $$HOSTNAME >&2

$cmd

Generated PBS scripts are saved in /tmp/easy_qsub-user. If jobs are submitted successfuly, PBS scripts will be moved to current directory. If not, they will be removed.

Support for multiple inputs

Inspired by qtask, multiple inputs is supported (See example 2). If "{}" appears in a command, it will be replaced with the current filename. Four formats are supported. For example, for a file named "a/b/read_1.fq.gz":

format target result
{} full path a/b/read_1.fq.gz
{%} basename read_1.fq.gz
{^.fq.gz} remove suffix from full path a/b/read_1
{%^.fq.gz} remove suffix from basename read_1

Running local jobs in parallel

It also support runing commands locally with option -lp (parallelly) or -ls (serially). This make it easy to switch between cluster and local machine.

Best partner: cluster_files

To make best use of the support for multiple input, a script cluster_files is added to cluster files into multiple directories by creating symbolic links or moving files (See example 3,4). It's useful for programs which take one directory as input.

Another useful scene is to apply different jobs to a same dataset. One bad directory structure is:

datasets/
├── A
├── A.stage1
├── A.stage2
├── B
├── B.stage1
└── B.stage2

A flexible structure can be organsize by cluster_files. Instead of changing original directory structure, using links could be more clear and flexible.

datasets
├── A
└── B
datasets.stage1
├── A
└── B
datasets.stage2
├── A
└── B

Examples

  1. Submit a single job

     easy_qsub 'ls -lh'
    
  2. Submit multiple jobs, runing fastqc for a lot of fq.gz files

     easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz
    

    Excuted commands are:

     mkdir -p QC/read_1.fastqc; zcat read_1.fq.gz | fastqc -o QC/read_1.fastqc stdin
     mkdir -p QC/read_2.fastqc; zcat read_2.fq.gz | fastqc -o QC/read_2.fastqc stdin
    

    Dry run with -vv

     easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz -vv
    
  3. Supposing a directory rawdata containing paired files as below.

     $ tree rawdata
     rawdata
     ├── A2_1.fq.gz
     ├── A2_1.unpaired.fq.gz
     ├── A2_2.fq.gz
     ├── A2_2.unpaired.fq.gz
     ├── A3_1.fq.gz
     ├── A3_1.unpaired.fq.gz
     ├── A3_2.fq.gz
     ├── A3_2.unpaired.fq.gz
     └── README.md
    

    And I have a program script.py, which takes a directory as input and do some thing with the paired files. Command is like this, script.py dirA.

    It is slow by submiting jobs like example 2), handing A2_*.fq.gz and then A3_*.fq.gz. We can split rawdata directory into multiple directories (cluster files by the prefix), and submit jobs for all directories.

     cluster_files -p '(.+?)_\d\.fq\.gz$' rawdata -o rawdata.cluster
    
     tree rawdata.cluster/
     rawdata.cluster/
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata/A2_1.fq.gz
     │   └── A2_2.fq.gz -> ../../rawdata/A2_2.fq.gz
     └── A3
         ├── A3_1.fq.gz -> ../../rawdata/A3_1.fq.gz
         └── A3_2.fq.gz -> ../../rawdata/A3_2.fq.gz
    
     easy_qsub 'script.py {}' rawdata.split/*
    

    Another example (e.g. some assembler can handle unpaired reads too):

     cluster_files -p '(.+?)_\d.*\.fq\.gz$' rawdata -o rawdata.cluster2
    
     tree rawdata.cluster2
     rawdata.cluster2
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata/A2_1.fq.gz
     │   ├── A2_1.unpaired.fq.gz -> ../../rawdata/A2_1.unpaired.fq.gz
     │   ├── A2_2.fq.gz -> ../../rawdata/A2_2.fq.gz
     │   └── A2_2.unpaired.fq.gz -> ../../rawdata/A2_2.unpaired.fq.gz
     └── A3
         ├── A3_1.fq.gz -> ../../rawdata/A3_1.fq.gz
         ├── A3_1.unpaired.fq.gz -> ../../rawdata/A3_1.unpaired.fq.gz
         ├── A3_2.fq.gz -> ../../rawdata/A3_2.fq.gz
         └── A3_2.unpaired.fq.gz -> ../../rawdata/A3_2.unpaired.fq.gz
    
  4. Another example (complexed directory structure)

     tree rawdata2
     rawdata2
     ├── OtherDir
     │   └── abc.fq.gz.txt
     ├── S1
     │   ├── A2_1.fq.gz
     │   ├── A2_1.unpaired.fq.gz
     │   ├── A2_2.fq.gz
     │   ├── A2_2.unpaired.fq.gz
     │   ├── A4_1.fq.gz
     │   └── A4_2.fq.gz
     └── S2
         ├── A3_1.fq.gz
         ├── A3_1.unpaired.fq.gz
         ├── A3_2.fq.gz
         └── A3_2.unpaired.fq.gz
    
     cluster_files -p '(.+?)_\d\.fq\.gz$' rawdata2/
    
     tree rawdata2.cluster/
     rawdata2.cluster/
     ├── A2
     │   ├── A2_1.fq.gz -> ../../rawdata2/S1/A2_1.fq.gz
     │   └── A2_2.fq.gz -> ../../rawdata2/S1/A2_2.fq.gz
     ├── A3
     │   ├── A3_1.fq.gz -> ../../rawdata2/S2/A3_1.fq.gz
     │   └── A3_2.fq.gz -> ../../rawdata2/S2/A3_2.fq.gz
     └── A4
         ├── A4_1.fq.gz -> ../../rawdata2/S1/A4_1.fq.gz
         └── A4_2.fq.gz -> ../../rawdata2/S1/A4_2.fq.gz
    
     cluster_files -p '(.+?)_\d\.fq\.gz$'  rawdata2/ -k -f  # keep original dir structure
    
     tree rawdata2.cluster/
     rawdata2.cluster/
     ├── S1
     │   ├── A2
     │   │   ├── A2_1.fq.gz -> ../../../rawdata2/S1/A2_1.fq.gz
     │   │   └── A2_2.fq.gz -> ../../../rawdata2/S1/A2_2.fq.gz
     │   └── A4
     │       ├── A4_1.fq.gz -> ../../../rawdata2/S1/A4_1.fq.gz
     │       └── A4_2.fq.gz -> ../../../rawdata2/S1/A4_2.fq.gz
     └── S2
         └── A3
             ├── A3_1.fq.gz -> ../../../rawdata2/S2/A3_1.fq.gz
             └── A3_2.fq.gz -> ../../../rawdata2/S2/A3_2.fq.gz
    

Installation

easy_qsub and cluster_files is a single script written in Python using standard library. It's Python 2/3 compatible, version 2.7 or later.

You can simply save the script easy_qsub and cluster_files to directory included in environment PATH, e.g /usr/local/bin.

Or

git clone https://github.com/shenwei356/easy_qsub.git
cd easy_qsub
sudo copy easy_qsub cluster_files /usr/local/bin

Usage

easy_qsub

usage: easy_qsub [-h] [-lp | -ls] [-N NAME] [-n NCPUS] [-m MEM] [-q QUEUE]
                 [-w WALLTIME] [-t TEMPLATE] [-o OUTFILE] [-v]
                 command [files [files ...]]

Easily submitting PBS jobs with script template. Multiple input files
supported.

positional arguments:
  command               command to submit
  files                 input files

optional arguments:
  -h, --help            show this help message and exit
  -lp, --local_p        run commands locally, parallelly
  -ls, --local_s        run commands locally, serially
  -N NAME, --name NAME  job name
  -n NCPUS, --ncpus NCPUS
                        cpu number [logical cpu number]
  -m MEM, --mem MEM     memory [5gb]
  -q QUEUE, --queue QUEUE
                        queue [batch]
  -w WALLTIME, --walltime WALLTIME
                        walltime [30:00:00:00]
  -t TEMPLATE, --template TEMPLATE
                        script template
  -o OUTFILE, --outfile OUTFILE
                        output script
  -v, --verbose         verbosely print information. -vv for just printing
                        command not creating scripts and submitting jobs

Note: if "{}" appears in a command, it will be replaced with the current
filename. More format supported: "{%}" for basename, "{^suffix}" for clipping
"suffix", "{%^suffix}" for clipping suffix from basename. See more:
https://github.com/shenwei356/easy_qsub

cluster_files

usage: cluster_files [-h] [-o OUTDIR] [-p PATTERN] [-k] [-m] [-f] indir

clustering files by regular expression [V3.0]

positional arguments:
  indir                 source directory

optional arguments:
  -h, --help            show this help message and exit
  -o OUTDIR, --outdir OUTDIR
                        out directory [<indir>.cluster]
  -p PATTERN, --pattern PATTERN
                        pattern (regular expression) of files in indir. if not
                        given, it will be the longest common substring of the
                        files. GROUP (parenthese) should be in the regular
                        expression. Captured group will be the cluster name.
                        e.g. "(.+?)_\d\.fq\.gz"
  -k, --keep            keep original dir structure
  -m, --mv              moving files instead of creating symbolic links
  -f, --force           force file overwriting, i.e. deleting existed out
                        directory

Copyright

Copyright (c) 2015-2017, Wei Shen ([email protected])

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].