All Projects → sigven → Gvanno

sigven / Gvanno

Generic germline variant annotation pipeline

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gvanno

Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (+339.13%)
Mutual labels:  vcf, workflow
Cookiecutter
DEPRECIATED! Please use nf-core/tools instead
Stars: ✭ 18 (-21.74%)
Mutual labels:  workflow
Rome
Carthage cache for S3, Minio, Ceph, Google Storage, Artifactory and many others
Stars: ✭ 724 (+3047.83%)
Mutual labels:  workflow
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+3430.43%)
Mutual labels:  workflow
Prefect
The easiest way to automate your data
Stars: ✭ 7,956 (+34491.3%)
Mutual labels:  workflow
Docs
Lightweight document management system packed with all the features you can expect from big expensive solutions
Stars: ✭ 827 (+3495.65%)
Mutual labels:  workflow
Rulesengine
A Json based Rules Engine with extensive Dynamic expression support
Stars: ✭ 714 (+3004.35%)
Mutual labels:  workflow
Hzdtf.foundation.framework
基础框架系统,支持.NET和.NET Core平台,语言:C#,DB支持MySql和SqlServer,主要功能有抽象持久化、服务层,将业务基本的增删改查抽离复用;提供代码生成器从DB生成实体、持久化、服务以及MVC控制器,每层依赖接口,并需要在客户端将对应实现层用Autofac程序集依赖注入,用AOP提供日志跟踪、事务、模型验证等。对Autofac、Redis、RabbitMQ封装扩展;DB访问提供自动主从访问,Redis客户端分区。特别适合管理系统。
Stars: ✭ 22 (-4.35%)
Mutual labels:  workflow
Wf wcf samples
Windows Communication Foundation (WCF) and Windows Workflow Foundation (WF) Samples
Stars: ✭ 17 (-26.09%)
Mutual labels:  workflow
Ntl
Node Task List: Interactive cli to list and run package.json scripts
Stars: ✭ 800 (+3378.26%)
Mutual labels:  workflow
Ipt
Interactive Pipe To: The Node.js cli interactive workflow
Stars: ✭ 783 (+3304.35%)
Mutual labels:  workflow
X Flowchart Vue
基于G6和Vue的可视化图形编辑器。A visual graph editor based on G6 and Vue.
Stars: ✭ 751 (+3165.22%)
Mutual labels:  workflow
Codevar
生成可用的代码变量 (CodeVar that return u a better variable from Chinese to English . )
Stars: ✭ 834 (+3526.09%)
Mutual labels:  workflow
Toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
Stars: ✭ 733 (+3086.96%)
Mutual labels:  workflow
Gitpr
Quick reference guide on fork and pull request workflow
Stars: ✭ 902 (+3821.74%)
Mutual labels:  workflow
Camunda Modeler
An integrated modeling solution for BPMN and DMN based on bpmn.io.
Stars: ✭ 718 (+3021.74%)
Mutual labels:  workflow
Titanoboa
Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+3321.74%)
Mutual labels:  workflow
Scipipe
Robust, flexible and resource-efficient pipelines using Go and the commandline
Stars: ✭ 826 (+3491.3%)
Mutual labels:  workflow
Perl Workflow
Simple, flexible system to implement workflows
Stars: ✭ 10 (-56.52%)
Mutual labels:  workflow
Helmsman
highly-efficient & lightweight mutation signature matrix aggregation
Stars: ✭ 19 (-17.39%)
Mutual labels:  vcf

gvanno - workflow for functional and clinical annotation of germline nucleotide variants

Contents

Overview

The germline variant annotator (gvanno) is a simple software package intended for analysis and interpretation of human DNA variants of germline origin. Variants and genes are annotated with disease-related and functional associations from a wide range of sources (see below). Technically, the workflow is built with the Docker technology, and it can also be installed through the Singularity framework.

gvanno accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow relies heavily upon Ensembl’s Variant Effect Predictor (VEP), and vcfanno. It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record. Note that if your input VCF contains data (genotypes) from multiple samples (i.e. a multisample VCF), the output TSV file will contain one line/record per sample variant.

News

  • December 7th 2020 - 1.4.1 release
    • Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
    • Software update (VEP 102)
    • Skipped DisGenet annotations (Open Targets serve similar purpose)
  • September 29th 2020 - 1.4.0 release
    • Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
    • Software updates (VEP 101)
    • Configuration through TOML file is omitted - all configurations are now encoded as optional arguments to the main Python script (gvanno.py)
  • June 30th 2020 - 1.3.2 release
    • Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform, Pfam, dbNSFP)
      • Using GENCODE v34 as the correct transcript assembly for grch38 (see issue)
      • Three new variant effect predictions from dbNSFP added: ClinPred, LIST-S2, and BayesDel
    • Added VEP plugin NearestExonJB
      • Annotates relative position (to the exon-intron junction) of variants in introns and exons (fields in output: INTRON_POSITION, EXON_POSITION)
  • May 8th 2020 - 1.3.0 release
    • Upgrade of VEP (v100) - GENCODE release 33 (grch38)
    • Data updates (ClinVar, UniProt, GWAS Catalog, Open Targets Platform)
  • November 22nd 2019 - 1.1.0 release
    • Ability to install and run workflow using Singularity, excellent contribution by @oskarvid, see step 1.1 in Getting Started
    • Data and software updates (ClinVar, UniProt, VEP)

Annotation resources

  • VEP - Variant Effect Predictor v102 (GENCODE v36/v19 as the gene reference dataset)
  • dBNSFP - Database of non-synonymous functional predictions (v4.1, June 2020)
  • gnomAD - Germline variant frequencies exome-wide (release 2.1, October 2018) - from VEP
  • dbSNP - Database of short genetic variants (build 153) - from VEP
  • 1000 Genomes Project - phase3 - Germline variant frequencies genome-wide (May 2013) - from VEP
  • ClinVar - Database of clinically related variants (December 2020)
  • Open Targets Platform - Target-disease and target-drug associations (2020_11, November 2020)
  • UniProt/SwissProt KnowledgeBase - Resource on protein sequence and functional information (2020_06, December 2020)
  • Pfam - Database of protein families and domains (v33.1, May 2020)
  • NHGRI-EBI GWAS Catalog - Catalog of published genome-wide association studies (December 2nd 2020)

Getting started

STEP 0: Python

An installation of Python (version 3.6) is required to run gvanno. Check that Python is installed by typing python --version in your terminal window.

STEP 1: Installation of Docker

  1. Install the Docker engine on your preferred platform
    • installing Docker on Linux
    • installing Docker on Mac OS
    • NOTE: We have not yet been able to perform enough testing on the Windows platform, and we have received feedback that particular versions of Docker/Windows do not work with PCGR (an example being mounting of data volumes)
  2. Test that Docker is running, e.g. by typing docker ps or docker images in the terminal window
  3. Adjust the computing resources dedicated to the Docker, i.e.:
1.1: Installation of Singularity (optional)
  1. Note: this has only been tested with Singularity version 2.4.2, your mileage may vary with other versions.

  2. Install Singularity

  3. Test that singularity works by running singularity --version

  4. If you are in the gvanno directory, build the singularity image like so:

    cd src

    sudo ./buildSingularity.sh

STEP 2: Download gvanno and data bundle

  1. Download and unpack the latest software release (1.4.1)

  2. Download and unpack the assembly-specific data bundle in the gvanno directory

    A data/ folder within the gvanno-X.X software folder should now have been produced

  3. Pull the gvanno Docker image (1.4.1) from DockerHub (approx 2.3Gb):

    • docker pull sigven/gvanno:1.4.1 (gvanno annotation engine)

STEP 3: Input preprocessing

The gvanno workflow accepts a single input file:

  • An unannotated, single-sample VCF file (>= v4.2) with germline variants (SNVs/InDels)

We strongly recommend that the input VCF is compressed and indexed using bgzip and tabix. NOTE: If the input VCF contains multi-allelic sites, these will be subject to decomposition.

STEP 5: Run example

Run the workflow with gvanno.py, which takes the following arguments and options:

usage:
gvanno.py -h [options]
--query_vcf QUERY_VCF
--gvanno_dir GVANNO_DIR
--output_dir OUTPUT_DIR
--genome_assembly grch37|grch38
--sample_id SAMPLE_ID
--container docker|singularity

gvanno - workflow for functional and clinical annotation of germline nucleotide variants

Required arguments:
--query_vcf QUERY_VCF
		    VCF input file with germline query variants (SNVs/InDels).
--gvanno_dir GVANNO_DIR
		    Directory that contains the gvanno data bundle, e.g. ~/gvanno-1.4.1
--output_dir OUTPUT_DIR
		    Output directory
--genome_assembly {grch37,grch38}
		    Genome assembly build: grch37 or grch38
--container {docker,singularity}
		    Run gvanno with docker or singularity
--sample_id SAMPLE_ID
		    Sample identifier - prefix for output files

Optional arguments:
--force_overwrite     By default, the script will fail with an error if any output file already exists.
		    You can force the overwrite of existing result files by using this flag, default: False
--version             show program's version number and exit
--no_vcf_validate     Skip validation of input VCF with Ensembl's vcf-validator, default: False
--lof_prediction      Predict loss-of-function variants with Loftee plugin in Variant Effect Predictor (VEP), default: False
--vep_n_forks VEP_N_FORKS
		    Number of forks for Variant Effect Predictor (VEP) processing, default: 4
--vep_buffer_size VEP_BUFFER_SIZE
		    Variant buffer size (variants read into memory simultaneously) for Variant Effect Predictor (VEP) processing
		    - set lower to reduce memory usage, default: 5000
--vep_pick_order VEP_PICK_ORDER
		    Comma-separated string of ordered transcript properties for primary variant pick in
		    Variant Effect Predictor (VEP) processing, default: canonical,appris,biotype,ccds,rank,tsl,length,mane
--vep_skip_intergenic
		    Skip intergenic variants in Variant Effect Predictor (VEP) processing, default: False
--vcfanno_n_processes VCFANNO_N_PROCESSES
		    Number of processes for vcfanno processing (see https://github.com/brentp/vcfanno#-p), default: 4

The examples folder contains an example VCF file. Analysis of the example VCF can be performed by the following command:

python ~/gvanno-1.4.1/gvanno.py
--query_vcf ~/gvanno-1.4.1/examples/example.grch37.vcf.gz
--gvanno_dir ~/gvanno-1.4.1
--output_dir ~/gvanno-1.4.1
--sample_id example
--genome_assembly grch37
--container docker
--force_overwrite

This command will run the Docker-based gvanno workflow and produce the following output files in the examples folder:

  1. example_gvanno_pass_grch37.vcf.gz (.tbi) - Bgzipped VCF file with rich set of functional/clinical annotations
  2. example_gvanno_pass_grch37.tsv.gz - Compressed TSV file with rich set of functional/clinical annotations

Similar files are produced for all variants, not only variants with a PASS designation in the VCF FILTER column.

Documentation

Documentation of the various variant and gene annotations should be interrogated from the header of the annotated VCF file. The column names of the tab-separated values (TSV) file will be identical to the INFO tags that are documented in the VCF file.

Contact

sigven AT ifi.uio.no

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].