All Projects → iqbal-lab-org → Gramtools

iqbal-lab-org / Gramtools

Genome inference from a population reference graph

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gramtools

Uta
Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image
Stars: ✭ 38 (-41.54%)
Mutual labels:  bioinformatics
Dram
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 47 (-27.69%)
Mutual labels:  bioinformatics
Pairix
1D/2D indexing and querying on bgzipped text file with a pair of genomic coordinates
Stars: ✭ 57 (-12.31%)
Mutual labels:  bioinformatics
Gatk
Official code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+1441.54%)
Mutual labels:  bioinformatics
Liger
Lightweight Iterative Gene set Enrichment in R
Stars: ✭ 44 (-32.31%)
Mutual labels:  bioinformatics
Emperor
Emperor a tool for the analysis and visualization of large microbial ecology datasets
Stars: ✭ 51 (-21.54%)
Mutual labels:  bioinformatics
Etrf
Exact Tandem Repeat Finder (not a TRF replacement)
Stars: ✭ 35 (-46.15%)
Mutual labels:  bioinformatics
Lambda
LAMBDA – the Local Aligner for Massive Biological DatA
Stars: ✭ 59 (-9.23%)
Mutual labels:  bioinformatics
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (-33.85%)
Mutual labels:  bioinformatics
Cwl Svg
A library for generating an interactive SVG visualization of CWL workflows
Stars: ✭ 57 (-12.31%)
Mutual labels:  bioinformatics
Singlecellhaystack
Finding surprising needles (=genes) in haystacks (=single cell transcriptome data).
Stars: ✭ 41 (-36.92%)
Mutual labels:  bioinformatics
Verifybamid
VerifyBamID2: A robust tool for DNA contamination estimation from sequence reads using ancestry-agnostic method.
Stars: ✭ 44 (-32.31%)
Mutual labels:  bioinformatics
Sv2
Support Vector Structural Variation Genotyper
Stars: ✭ 52 (-20%)
Mutual labels:  bioinformatics
Migmap
HTS-compatible wrapper for IgBlast V-(D)-J mapping tool
Stars: ✭ 38 (-41.54%)
Mutual labels:  bioinformatics
Dna Nn
Model and predict short DNA sequence features with neural networks
Stars: ✭ 59 (-9.23%)
Mutual labels:  bioinformatics
Locuszoom Standalone
Create regional association plots from GWAS or meta-analysis
Stars: ✭ 35 (-46.15%)
Mutual labels:  bioinformatics
Yacrd
Yet Another Chimeric Read Detector
Stars: ✭ 49 (-24.62%)
Mutual labels:  bioinformatics
Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (-3.08%)
Mutual labels:  bioinformatics
Qiime16stutorial
A tutorial on methods of 16S analysis with QIIME 1
Stars: ✭ 59 (-9.23%)
Mutual labels:  bioinformatics
Python biologist
Python Programming for Biologists
Stars: ✭ 55 (-15.38%)
Mutual labels:  bioinformatics

Build Status Docker Repository on Quay

gramtools

TL;DR Genome inference using prior information encoded as a reference graph.

Gramtools builds a population reference genome (PRG) from a set of variants. Given sequence data from an individual, the graph is annotated with coverage and genotyped, producing a VCF and a jVCF of all the variation in the graph.

A personalised reference genome for the sample is also inferred and new variation can be discovered against it (see usage). You can then build a new PRG from the initial and the new variants, and genotype this augmented PRG.

Contents

Install

Container

The easiest way to run gramtools is via a container (hosted on quay.io).

To run with Docker:

tag="latest" # or, a specific released version
docker run "quay.io/iqballab/gramtools:${tag}"

To run with Singularity:

tag="latest" # or, a specific released version
URI="docker://quay.io/iqballab/gramtools:${tag}"
singularity exec "$URI" gramtools

Local

Latest release

VERSION="1.7.0"
wget -O - "https://github.com/iqbal-lab-org/gramtools/releases/download/v${VERSION}/gramtools-${VERSION}.tar.gz" | tar xfz -
pip install "./gramtools-${VERSION}"

The latest release includes a precompiled binary for Linux. This will be used if it works on your machine, else it will get compiled during the installation.

We recommend installing inside a virtual environment:

python -m venv gram_ve && source gram_ve/bin/activate
pip install pip==20.0.2
pip install gramtools-${VERSION}

Latest source

pip install git+https://github.com/iqbal-lab-org/gramtools

This will always compile the binary.

Requirements

  • Python >= 3.6
  • pip >= 20.0.2

If the binary needs to be compiled:

  • C++17 compatible compiler: g++ >=8 (tested), clang >=7 (untested)

For gramtools discover to function, you additionally need at runtime:

  • R
  • Perl

Usage

Gramtools

Usage: 
    gramtools [-h] [--debug] [--force] subcommand
    
    Subcommands:
        gramtools build -o GRAM_DIR --ref REFERENCE
                       (--vcf VCF [VCF ...] | --prg PRG)
                       [--kmer_size KMER_SIZE]

        gramtools genotype -i GRAM_DIR -o GENO_DIR
                          --reads READS [READS ...] --sample_id SAMPLE_ID
                          [--ploidy {haploid,diploid}]
                          [--max_threads MAX_THREADS] [--seed SEED]

        gramtools discover -i GENO_DIR -o DISCO_DIR
                          [--reads READS [READS ...]]

        gramtools simulate --prg PRG
                           [--max_num_paths MAX_NUM_PATHS]
                           [--sample_id SAMPLE_ID] [--output_dir OUTPUT_DIR]

Subcommands explained

  1. build - given a VCF and reference or a prg file, construct the graph.

    • --kmer_size: used for indexing the graph in preparation for genotype. higher k <=> faster genotype, but build output will consume more disk space.
  2. genotype - map reads to a graph generated in build and genotype the graph. Produces genotype calls (VCF) and a personalised reference genome (fasta).

    • --reads: 1+ reads files in (fasta/fastq/sam/bam/cram) format
    • --sample_id: displayed in VCF & personalised reference outputs
  3. discover - discovers new variation against the personalised reference genome from genotype using one or more variant callers (currently: cortex).

  4. simulate- samples paths through a prg, producing a fasta of the paths and a genotyped JSON of the variant bubbles the path went through.

    • --prg: a prg file as output by build

Documentation

Examples, documentation, and planned future enhancements can be found in the wiki.

For the C++ source code, doxygen formatted documentation can be generated by running doxygen doc/Doxyfile.in from inside the gramtools directory.

The documentation gets generated in doc/html/index.html and provides a useful reference for all files, classes, functions and data structures in gramtools.

Contributing

Please refer to the developers wiki page.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].