All Projects → pwwang → vcfstats

pwwang / vcfstats

Licence: other
Powerful statistics for VCF files

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to vcfstats

SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (+153.13%)
Mutual labels:  vcf, vcf-files
learning vcf file
Learning the Variant Call Format
Stars: ✭ 104 (+225%)
Mutual labels:  vcf, vcf-files
spark-vcf
Spark VCF data source implementation for Dataframes
Stars: ✭ 15 (-53.12%)
Mutual labels:  vcf, vcf-files
Vcf2maf
Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Stars: ✭ 229 (+615.63%)
Mutual labels:  vcf
Snippy
✂️ ⚡️ Rapid haploid variant calling and core genome alignment
Stars: ✭ 245 (+665.63%)
Mutual labels:  vcf
laravel-vcard
A fluent builder class for vCard files.
Stars: ✭ 29 (-9.37%)
Mutual labels:  vcf
Survivor
Toolset for SV simulation, comparison and filtering
Stars: ✭ 180 (+462.5%)
Mutual labels:  vcf
phenomenet-vp
A phenotype-based tool for variant prioritization in WES and WGS data
Stars: ✭ 31 (-3.12%)
Mutual labels:  vcf-files
vembrane
vembrane filters VCF records using python expressions
Stars: ✭ 46 (+43.75%)
Mutual labels:  vcf
Variants2Neoantigen
A neoantigen calling pipeline begins from variants record file (MAF) (Not maintain now)
Stars: ✭ 27 (-15.62%)
Mutual labels:  vcf
csv2vcf
🔧 Simple script in python to convert CSV files to VCF
Stars: ✭ 66 (+106.25%)
Mutual labels:  vcf
Hap.py
Haplotype VCF comparison tools
Stars: ✭ 249 (+678.13%)
Mutual labels:  vcf
CuteVCF
simple viewer for variant call format using htslib
Stars: ✭ 30 (-6.25%)
Mutual labels:  vcf
Cyvcf2
cython + htslib == fast VCF and BCF processing
Stars: ✭ 243 (+659.38%)
Mutual labels:  vcf
calcardbackup
calcardbackup: moved to https://codeberg.org/BernieO/calcardbackup
Stars: ✭ 67 (+109.38%)
Mutual labels:  vcf
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+587.5%)
Mutual labels:  vcf
rvtests
Rare variant test software for next generation sequencing data
Stars: ✭ 114 (+256.25%)
Mutual labels:  vcf-files
bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-50%)
Mutual labels:  vcf
2vcf
convert 23andme or Ancestry.com raw genotype calls into VCF format, with dbSNP annotations
Stars: ✭ 42 (+31.25%)
Mutual labels:  vcf
vcf stuff
📊Evaluating, filtering, comparing, and visualising VCF
Stars: ✭ 19 (-40.62%)
Mutual labels:  vcf

vcfstats - powerful statistics for VCF files

Pypi Github PythonVers docs github action Codacy Codacy coverage

Documentation | CHANGELOG

Motivation

There are a couple of tools that can plot some statistics of VCF files, including bcftools and jvarkit. However, none of them could:

  1. plot specific metrics
  2. customize the plots
  3. focus on variants with certain filters

R package vcfR can do some of the above. However, it has to load entire VCF into memory, which is not friendly to large VCF files.

Installation

pip install -U vcfstats

Or run with docker or singularity:

docker run --rm justold/vcfstats:latest vcfstats
# or
singularity run docker://justold/vcfstats:latest vcfstats

Gallery

Number of variants on each chromosome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1) ~ CONTIG' \
	--title 'Number of variants on each chromosome' \
	--config examples/config.toml

Number of variants on each chromosome

Changing labels and ticks

vcfstats uses plotnine for plotting, read more about it on how to specify --ggs to modify the plots.

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1) ~ CONTIG' \
	--title 'Number of variants on each chromosome (modified)' \
	--config examples/config.toml \
	--ggs 'scale_x_discrete(name ="Chromosome", \
		limits=["1","2","3","4","5","6","7","8","9","10","X"]); \
		ylab("# Variants")'

Number of variants on each chromosome (modified)

Number of variants on first 5 chromosome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1) ~ CONTIG[1,2,3,4,5]' \
	--title 'Number of variants on each chromosome (first 5)' \
	--config examples/config.toml
# or
vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1) ~ CONTIG[1-5]' \
	--title 'Number of variants on each chromosome (first 5)' \
	--config examples/config.toml
# or
# require vcf file to be tabix-indexed.
vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1) ~ CONTIG' \
	--title 'Number of variants on each chromosome (first 5)' \
	--config examples/config.toml -r 1 2 3 4 5

Number of variants on each chromosome (first 5)

Number of substitutions of SNPs

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
	--title 'Number of substitutions of SNPs' \
	--config examples/config.toml

Number of substitutions of SNPs

Only with SNPs PASS all filters

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1, VARTYPE[snp]) ~ SUBST[A>T,A>G,A>C,T>A,T>G,T>C,G>A,G>T,G>C,C>A,C>T,C>G]' \
	--title 'Number of substitutions of SNPs (passed)' \
	--config examples/config.toml \
	--passed

Number of substitutions of SNPs (passed)

Alternative allele frequency on each chromosome

# using a dark theme
vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'AAF ~ CONTIG' \
	--title 'Allele frequency on each chromosome' \
	--config examples/config.toml --ggs 'theme_dark()'

Allele frequency on each chromosome

Using boxplot

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'AAF ~ CONTIG' \
	--title 'Allele frequency on each chromosome (boxplot)' \
	--config examples/config.toml \
	--figtype boxplot

Allele frequency on each chromosome

Using density plot/histogram to investigate the distribution:

You can plot the distribution, using density plot or histogram

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'AAF ~ CONTIG[1,2]' \
	--title 'Allele frequency on chromosome 1,2' \
	--config examples/config.toml \
	--figtype density

Allele frequency on chromosome 1,2

Overall distribution of allele frequency

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'AAF ~ 1' \
	--title 'Overall allele frequency distribution' \
	--config examples/config.toml

Overall allele frequency distribution

Excluding some low/high frequency variants

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'AAF[0.05, 0.95] ~ 1' \
	--title 'Overall allele frequency distribution (0.05-0.95)' \
	--config examples/config.toml

Overall allele frequency distribution

Counting types of variants on each chromosome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1, group=VARTYPE) ~ CHROM' \
	# or simply
	# --formula 'VARTYPE ~ CHROM' \
	--title 'Types of variants on each chromosome' \
	--config examples/config.toml

Types of variants on each chromosome

Using bar chart if there is only one chromosome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'COUNT(1, group=VARTYPE) ~ CHROM[1]' \
	# or simply
	# --formula 'VARTYPE ~ CHROM[1]' \
	--title 'Types of variants on chromosome 1' \
	--config examples/config.toml \
	--figtype pie

Types of variants on chromosome 1

Counting variant types on whole genome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	# or simply
	# --formula 'VARTYPE ~ 1' \
	--formula 'COUNT(1, group=VARTYPE) ~ 1' \
	--title 'Types of variants on whole genome' \
	--config examples/config.toml

Types of variants on whole genome

Counting type of mutant genotypes (HET, HOM_ALT) for sample 1 on each chromosome

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	# or simply
	# --formula 'GTTYPEs[HET,HOM_ALT]{0} ~ CHROM' \
	--formula 'COUNT(1, group=GTTYPEs[HET,HOM_ALT]{0}) ~ CHROM' \
	--title 'Mutant genotypes on each chromosome (sample 1)' \
	--config examples/config.toml

Mutant genotypes on each chromosome

Exploration of mean(genotype quality) and mean(depth) on each chromosome for sample 1

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'MEAN(GQs{0}) ~ MEAN(DEPTHs{0}, group=CHROM)' \
	--title 'GQ vs depth (sample 1)' \
	--config examples/config.toml

GQ vs depth (sample 1)

Exploration of depths for sample 1,2

vcfstats --vcf examples/sample.vcf \
	--outdir examples/ \
	--formula 'DEPTHs{0} ~ DEPTHs{1}' \
	--title 'Depths between sample 1 and 2' \
	--config examples/config.toml

Depths between sample 1 and 2

See more examples:

#15 (comment)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].