All Projects → Illumina → gvcfgenotyper

Illumina / gvcfgenotyper

Licence: Apache-2.0 License
A utility for merging and genotyping Illumina-style GVCFs.

Programming Languages

C++
36643 projects - #6 most used programming language
c
50402 projects - #5 most used programming language

gvcfgenotyper

A utility for merging and genotyping strelka2 GVCFs.

This source code is provided under the Apache License 2.0. Copyright (c) 2018, Illumina, Inc. All rights reserved.

This tool provides basic genome VCF (GVCF) merging and genotyping functionality to provide a multisample BCF/VCF suitable for cohort analysis. Variants are normalised and decomposed on-the-fly before merging. Samples that do not have a particular variant have their homozygous reference confidence estimated from the GVCF depth blocks using some simple heuristics.

Caution:

This software is in early development, it is largely functional but may contain bugs.

There are various flavours of GVCF in the wild, this tool only works with the format produced by Illumina pipelines.

Installation

The only requirement is a C++11 compatible compiler.

git clone https://github.com/Illumina/gvcfgenotyper.git
cd gvcfgenotyper/
make
bin/gvcfgenotyper

Running

find directory/ -name '*genome.vcf.gz' > gvcfs.txt
time ./gvcfgenotyper -f genome.fa -l gvcfs.txt -Ob -o output.bcf

or with some trivial parallelism:

for i in {1..22} X;
do 
    echo -r $i -f genome.fa -l gvcfs.txt -Ob -o output.chr${i}.bcf;
done | xargs -l -P 23 ./gvcfgenotyper

If you are looking for a sequencing cohort to try this out, have a look at Polaris.

Known issues

Homozygous reference confidence (GQ and DP) works well for SNPs but is less reliable for indels. Our homozygous reference likelihoods are currently just dummy values eg. PL=0,255,255 and should not be used for any sophisticated analysis such as denovo mutation calling (Strelka has good joint-calling-from BAM functionality for small pedigrees).

Complex variants can occasionally contain primitive alleles called in other samples. We are investigating decomposition approaches for this problem.

We are working on multi-threading to improve performance.

Feedback

Please open an issue on github to provide feedback or ask questions.

Acknowledgements

This tool depends on htslib, googletest and spdlog. We also borrowed some variant normalisation code from BCFtools.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].