All Projects → medvedevgroup → vargeno

medvedevgroup / vargeno

Licence: MIT license
Towards fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics.

Programming Languages

C++
36643 projects - #6 most used programming language
TeX
3793 projects
Makefile
30231 projects
c
50402 projects - #5 most used programming language
r
7636 projects
CMake
9771 projects

Projects that are alternatives of or similar to vargeno

bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-11.11%)
Mutual labels:  computational-biology
artistoo
CPM implementation in pure JavaScript
Stars: ✭ 25 (+38.89%)
Mutual labels:  computational-biology
doctoral-thesis
📖 Generation and Applications of Knowledge Graphs in Systems and Networks Biology
Stars: ✭ 26 (+44.44%)
Mutual labels:  computational-biology
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+894.44%)
Mutual labels:  computational-biology
Tangram
Spatial alignment of single cell transcriptomic data.
Stars: ✭ 149 (+727.78%)
Mutual labels:  computational-biology
Circle-Map
A method for circular DNA detection based on probabilistic mapping of ultrashort reads
Stars: ✭ 45 (+150%)
Mutual labels:  genotyping
rsnps
Wrapper to a number of SNP web APIs
Stars: ✭ 44 (+144.44%)
Mutual labels:  snps
Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+561.11%)
Mutual labels:  computational-biology
Computational-CryoEM
A curated list of awesome computational cryo-EM methods.
Stars: ✭ 33 (+83.33%)
Mutual labels:  computational-biology
STing
Ultrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (-16.67%)
Mutual labels:  computational-biology
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+116.67%)
Mutual labels:  computational-biology
hess
Estimate local SNP heritability and genetic covariance from GWAS summary association statistics.
Stars: ✭ 27 (+50%)
Mutual labels:  snps
cellsnp-lite
Efficient genotyping bi-allelic SNPs on single cells
Stars: ✭ 47 (+161.11%)
Mutual labels:  genotyping
CNApy
An integrated visual environment for metabolic modeling with common methods such as FBA, FVA and Elementary Flux Modes, and advanced features such as thermodynamic methods, extended Minimal Cut Sets, OptKnock, RobustKnock, OptCouple and more!
Stars: ✭ 27 (+50%)
Mutual labels:  computational-biology
GeneticVariation.jl
Datastructures and algorithms for working with genetic variation
Stars: ✭ 33 (+83.33%)
Mutual labels:  snps
impute-me
This is the code behind the www.impute.me site. It contains algorithms for personal genome analysis, including imputation and polygenic risk score calculation
Stars: ✭ 96 (+433.33%)
Mutual labels:  snps
arv
A fast 23andMe DNA parser and inferrer for Python
Stars: ✭ 98 (+444.44%)
Mutual labels:  snps
PopGenome
An Efficient Swiss Army Knife for Population Genomic Analyses in R
Stars: ✭ 13 (-27.78%)
Mutual labels:  snps
contact map
Contact map analysis for biomolecules; based on MDTraj
Stars: ✭ 27 (+50%)
Mutual labels:  computational-biology
nullarbor
💾 📃 "Reads to report" for public health and clinical microbiology
Stars: ✭ 111 (+516.67%)
Mutual labels:  genotyping

VarGeno

Fase SNP genotyping tool for whole genome sequencing data and large SNP database.

Install from Bioconda

VarGeno can be installed from Bioconda with command conda install vargeno.

Go to this link for more information about Bioconda.

If you do not have Bioconda installed, you can install VarGeno from source code.

Quick Usage

VarGeno takes as input:

  1. A reference genome sequence in FASTA file format.
  2. A list of SNPs to be genotyped in VCF file format.
  3. Sequencing reads from the donor genome in FASTQ file format.

Before genotyping an individual, you must construct indices for the reference and SNP list using the following commands:

vargeno index ref.fa snp.vcf index_prefix

To perform the genotyping:

vargeno geno index_prefix reads.fq snp.vcf output_filename

Here index_prefix should be the same string as index generating.

Output format: VCF

VarGeno's genotyping results are in the "FORMAT" column of VCF file.

  1. genotypes: in "GT" field: 0/0, 0/1 or 1/1.
  2. genotype quality: in "GQ" field, encoded as a phred quality (Integer).

For details of "GT" and "GQ" fields, please refer to The Variant Call Format(VCF) Version 4.2 Specification.

Install from Source Code

Prerequisites

  • A modern, C++11 ready compiler, such as g++ version 4.9 or higher.
  • The cmake build system (only necessary to install SDSL library. If SDSL library already installed, cmake is not needed)
  • A 64-bit operating system. Either Mac OS X or Linux are currently supported.

Install Command

git clone https://github.com/medvedevgroup/vargeno.git
cd vargeno
export PREFIX=$HOME
bash ./install.sh

You should then see vargeno in vargeno directory. To verify that your installation is correct, you can run the toy example below.

Example

The example dataset is in https://github.com/medvedevgroup/vargeno/tree/master/test .

In this example, we genotype 100 SNPs on human chromosome 22 with a small subset of 1000 Genome Project Illumina sequencing reads. The whole process should finish in around a minute and requries 34 GB RAM.

  1. go to test data directory

  2. pre-process the reference and SNP list to generate indices:

vargeno index chr22.fa snp.vcf test_prefix
  1. genotype variants:
vargeno geno test_prefix reads.fq snp.vcf genotyped.vcf

The expected output of VarGeno on the example dataset should be https://github.com/medvedevgroup/vargeno/blob/master/test/expected_output.

Memory Lite Version

The memory lite version of VarGeno (VarGeno-Lite) is maintained as an independent project in https://github.com/medvedevgroup/vargeno_lite.

Citation

If you use VarGeno in your research, please cite

  • Chen Sun and Paul Medvedev, Toward fast and accurate SNP genotyping from whole genome sequencing data for bedside diagnostics.

VarGeno's algorithm is built on top of LAVA's. Its code is built on top of LAVA's and it reuses a lot of LAVA's code. It uses some code from the AllSome project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].