iqbal-lab-org / Gramtools
Programming Languages
Labels
Projects that are alternatives of or similar to Gramtools
gramtools
TL;DR Genome inference using prior information encoded as a reference graph.
Gramtools builds a population reference genome (PRG) from a set of variants. Given sequence data from an individual, the graph is annotated with coverage and genotyped, producing a VCF and a jVCF of all the variation in the graph.
A personalised reference genome for the sample is also inferred and new variation can be discovered against it (see usage). You can then build a new PRG from the initial and the new variants, and genotype this augmented PRG.
Contents
Install
Container
The easiest way to run gramtools
is via a container (hosted on quay.io).
To run with Docker:
tag="latest" # or, a specific released version
docker run "quay.io/iqballab/gramtools:${tag}"
To run with Singularity:
tag="latest" # or, a specific released version
URI="docker://quay.io/iqballab/gramtools:${tag}"
singularity exec "$URI" gramtools
Local
Latest release
VERSION="1.7.0"
wget -O - "https://github.com/iqbal-lab-org/gramtools/releases/download/v${VERSION}/gramtools-${VERSION}.tar.gz" | tar xfz -
pip install "./gramtools-${VERSION}"
The latest release includes a precompiled binary for Linux. This will be used if it works on your machine, else it will get compiled during the installation.
We recommend installing inside a virtual environment:
python -m venv gram_ve && source gram_ve/bin/activate
pip install pip==20.0.2
pip install gramtools-${VERSION}
Latest source
pip install git+https://github.com/iqbal-lab-org/gramtools
This will always compile the binary.
Requirements
- Python >= 3.6
- pip >= 20.0.2
If the binary needs to be compiled:
- C++17 compatible compiler: g++ >=8 (tested), clang >=7 (untested)
For gramtools discover
to function, you additionally need at runtime:
- R
- Perl
Usage
Gramtools
Usage:
gramtools [-h] [--debug] [--force] subcommand
Subcommands:
gramtools build -o GRAM_DIR --ref REFERENCE
(--vcf VCF [VCF ...] | --prg PRG)
[--kmer_size KMER_SIZE]
gramtools genotype -i GRAM_DIR -o GENO_DIR
--reads READS [READS ...] --sample_id SAMPLE_ID
[--ploidy {haploid,diploid}]
[--max_threads MAX_THREADS] [--seed SEED]
gramtools discover -i GENO_DIR -o DISCO_DIR
[--reads READS [READS ...]]
gramtools simulate --prg PRG
[--max_num_paths MAX_NUM_PATHS]
[--sample_id SAMPLE_ID] [--output_dir OUTPUT_DIR]
Subcommands explained
-
build - given a VCF and reference or a prg file, construct the graph.
-
--kmer_size
: used for indexing the graph in preparation forgenotype
. higherk
<=> fastergenotype
, butbuild
output will consume more disk space.
-
-
genotype - map reads to a graph generated in
build
and genotype the graph. Produces genotype calls (VCF) and a personalised reference genome (fasta).-
--reads
: 1+ reads files in (fasta/fastq/sam/bam/cram) format -
--sample_id
: displayed in VCF & personalised reference outputs
-
-
discover - discovers new variation against the personalised reference genome from
genotype
using one or more variant callers (currently: cortex). -
simulate- samples paths through a prg, producing a fasta of the paths and a genotyped JSON of the variant bubbles the path went through.
-
--prg
: a prg file as output bybuild
-
Documentation
Examples, documentation, and planned future enhancements can be found in the wiki.
For the C++ source code, doxygen formatted documentation can be generated by running
doxygen doc/Doxyfile.in
from inside the gramtools directory.
The documentation gets generated in doc/html/index.html and provides a useful reference for all files, classes, functions and data structures in gramtools.
Contributing
Please refer to the developers wiki page.
License
MIT