Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hall-lab → Svtyper

hall-lab / Svtyper

Licence: mit

Bayesian genotyper for structural variants

Programming Languages

python

139335 projects - #7 most used programming language

Labels

bioinformatics genomics vcf

Projects that are alternatives of or similar to Svtyper

Genomics

A collection of scripts and notes related to genomics and bioinformatics

Stars: ✭ 101 (+27.85%)

Mutual labels: bioinformatics, genomics, vcf

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-67.09%)

Mutual labels: bioinformatics, genomics, vcf

Hap.py

Haplotype VCF comparison tools

Stars: ✭ 249 (+215.19%)

Mutual labels: bioinformatics, genomics, vcf

Hail

Scalable genomic data analysis.

Stars: ✭ 706 (+793.67%)

Mutual labels: bioinformatics, genomics, vcf

Cyvcf2

cython + htslib == fast VCF and BCF processing

Stars: ✭ 243 (+207.59%)

Mutual labels: bioinformatics, genomics, vcf

Vcfanno

annotate a VCF with other VCFs/BEDs/tabixed files

Stars: ✭ 259 (+227.85%)

Mutual labels: bioinformatics, genomics, vcf

Pygeno

Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs

Stars: ✭ 261 (+230.38%)

Mutual labels: bioinformatics, genomics, vcf

Helmsman

highly-efficient & lightweight mutation signature matrix aggregation

Stars: ✭ 19 (-75.95%)

Mutual labels: bioinformatics, vcf

Fermi2

Stars: ✭ 23 (-70.89%)

Mutual labels: bioinformatics, genomics

Bgt

Flexible genotype query among 30,000+ samples whole-genome

Stars: ✭ 72 (-8.86%)

Mutual labels: bioinformatics, genomics

16gt

Simultaneous detection of SNPs and Indels using a 16-genotype probabilistic model

Stars: ✭ 26 (-67.09%)

Mutual labels: bioinformatics, vcf

Awesome Sequencing Tech Papers

A collection of publications on comparison of high-throughput sequencing technologies.

Stars: ✭ 21 (-73.42%)

Mutual labels: bioinformatics, genomics

Minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences

Stars: ✭ 912 (+1054.43%)

Mutual labels: bioinformatics, genomics

Gubbins

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

Stars: ✭ 67 (-15.19%)

Mutual labels: bioinformatics, genomics

Galaxy

Data intensive science for everyone.

Stars: ✭ 812 (+927.85%)

Mutual labels: bioinformatics, genomics

Fastq.bio

An interactive web tool for quality control of DNA sequencing data

Stars: ✭ 76 (-3.8%)

Mutual labels: bioinformatics, genomics

Nucleus

Python and C++ code for reading and writing genomics data.

Stars: ✭ 657 (+731.65%)

Mutual labels: bioinformatics, genomics

Sns

Analysis pipelines for sequencing data

Stars: ✭ 43 (-45.57%)

Mutual labels: bioinformatics, genomics

Gatk

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (+1168.35%)

Mutual labels: bioinformatics, genomics

Dram

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes

Stars: ✭ 47 (-40.51%)

Mutual labels: bioinformatics, genomics

View All Similar Projects ➔

SVTyper

Bayesian genotyper for structural variants

Overview

SVTyper performs breakpoint genotyping of structural variants (SVs) using whole genome sequencing data. Users must supply a VCF file of sites to genotype (which may be generated by LUMPY) as well as a BAM/CRAM file of Illumina paired-end reads aligned with BWA-MEM. SVTyper assesses discordant and concordant reads from paired-end and split-read alignments to infer genotypes at each site. Algorithm details and benchmarking are described in Chiang et al., 2015.

Installation

Requirements:

Python 2.7.x

Install via `pip`

pip install git+https://github.com/hall-lab/svtyper.git

svtyper depends on pysam (version 0.15.0 or newer), numpy, and scipy; svtyper-sso additionally depends on cytoolz. If the dependencies aren't already available on your system, pip will attempt to download and install them.

`svtyper` vs `svtyper-sso`

svtyper is the original implementation of the genotyping algorithm, and works with multiple samples. svtyper-sso is an alternative implementation of svtyper that is optimized for genotyping a single sample. svtyper-sso is a parallelized implementation of svtyper that takes advantage of multiple CPU cores via the multiprocessing module. svtyper-sso can offer a 2x or more speedup (depending on how many CPU cores used) in genotyping a single sample. NOTE: svtyper-sso is not yet stable. There are minor logging differences between the two and svtyper-sso may exit with an error prematurely when processing CRAM files.

Example Usage

`svtyper`

As a Command Line Python Script

svtyper \
    -i sv.vcf \
    -B sample.bam \
    -l sample.bam.json \
    > sv.gt.vcf

As a Python Library

import svtyper.classic as svt

input_vcf = "/path/to/input.vcf"
input_bam = "/path/to/input.bam"
library_info = "/path/to/library_info.json"
output_vcf = "/path/to/output.vcf"

with open(input_vcf, "r") as inf, open(output_vcf, "w") as outf:
    svt.sv_genotype(bam_string=input_bam,
                    vcf_in=inf,
                    vcf_out=outf,
                    min_aligned=20,
                    split_weight=1,
                    disc_weight=1,
                    num_samp=1000000,
                    lib_info_path=library_info,
                    debug=False,
                    alignment_outpath=None,
                    ref_fasta=None,
                    sum_quals=False,
                    max_reads=None)

# Results will be inside the /path/to/output.vcf file

`svtyper-sso`

As a Command Line Python Script

svtyper-sso \
    --core 2 # number of cpu cores to use \
    --batch_size 1000 # number of SVs to process in a single batch (default: 1000) \
    --max_reads 1000 # skip genotyping if SV contains valid reads greater than this threshold (default: 1000) \
    -i sv.vcf \
    -B sample.bam \
    -l sample.bam.json \
    > sv.gt.vcf

As a Python Library

import svtyper.singlesample as sso

input_vcf = "/path/to/input.vcf"
input_bam = "/path/to/input.bam"
library_info = "/path/to/library_info.json"
output_vcf = "/path/to/output.vcf"

with open(input_vcf, "r") as inf, open(output_vcf, "w") as outf:
    sso.sso_genotype(bam_string=input_bam,
                     vcf_in=inf,
                     vcf_out=outf,
                     min_aligned=20,
                     split_weight=1,
                     disc_weight=1,
                     num_samp=1000000,
                     lib_info_path=library_info,
                     debug=False,
                     alignment_outpath=None,
                     ref_fasta=None,
                     sum_quals=False,
                     max_reads=1000,
                     cores=2,
                     batch_size=1000)

# Results will be inside the /path/to/output.vcf file

Development

Requirements:

Python 2.7 or newer
GNU Make
virtualenv (or conda for anaconda or miniconda users)

Setting Up a Development Environment

Using `virtualenv`

git clone https://github.com/hall-lab/svtyper.git
cd svtyper
virtualenv myvenv
source myvenv/bin/activate
pip install -e .
<add, edit, or delete code>
make test

# when you're finished with development
git push <remote-name> <branch>
deactivate
cd .. && rm -rf svtyper

Using `conda`

git clone https://github.com/hall-lab/svtyper.git
cd svtyper
conda create --channel bioconda --name mycenv pysam numpy scipy cytoolz # type 'y' when prompted with "proceed ([y]/n)?"
source activate mycenv
pip install -e .
<add, edit, or delete code>
make test


# when you're finished with development
git push <remote-name> <branch>
source deactivate
cd .. && rm -rf svtyper
conda remove --name mycenv --all

Troubleshooting

Many common issues are related to abnormal insert size distributions in the BAM file. SVTyper provides methods to assess and visualize the characteristics of sequencing libraries.

Running SVTyper with the -l flag creates a JSON file with essential metrics on a BAM file. SVTyper will sample the first N reads for the file (1 million by default) to parse the libraries, read groups, and insert size histograms. This can be done in the absence of a VCF file.

svtyper \
    -B my.bam \
    -l my.bam.json

The lib_stats.R script produces insert size histograms from the JSON file

scripts/lib_stats.R my.bam.json my.bam.json.pdf

Citation

C Chiang, R M Layer, G G Faust, M R Lindberg, D B Rose, E P Garrison, G T Marth, A R Quinlan, and I M Hall. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Meth 12, 966–968 (2015). doi:10.1038/nmeth.3505.

http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3505.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (37) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hall-lab / Svtyper

Programming Languages

Labels

Projects that are alternatives of or similar to Svtyper

SVTyper

Overview

Installation

Install via pip

svtyper vs svtyper-sso

Example Usage

svtyper

As a Command Line Python Script

As a Python Library

svtyper-sso

As a Command Line Python Script

As a Python Library

Development

Setting Up a Development Environment

Using virtualenv

Using conda

Troubleshooting

Citation

Install via `pip`

`svtyper` vs `svtyper-sso`

`svtyper`

`svtyper-sso`

Using `virtualenv`

Using `conda`