All Projects → mskcc → Vcf2maf

mskcc / Vcf2maf

Licence: other
Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms

Programming Languages

perl
6916 projects

Labels

Projects that are alternatives of or similar to Vcf2maf

Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (+13.97%)
Mutual labels:  vcf
Genozip
Compressor for genomic files (FASTQ, SAM/BAM, VCF, FASTA, GVF, 23andMe...), up to 5x better than gzip and faster too
Stars: ✭ 53 (-76.86%)
Mutual labels:  vcf
Vcf2phylip
Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis
Stars: ✭ 126 (-44.98%)
Mutual labels:  vcf
Htslib
C library for high-throughput sequencing data formats
Stars: ✭ 529 (+131%)
Mutual labels:  vcf
Tiledb Vcf
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-88.65%)
Mutual labels:  vcf
Svtyper
Bayesian genotyper for structural variants
Stars: ✭ 79 (-65.5%)
Mutual labels:  vcf
VCF-kit
VCF-kit: Assorted utilities for the variant call format
Stars: ✭ 94 (-58.95%)
Mutual labels:  vcf
Survivor
Toolset for SV simulation, comparison and filtering
Stars: ✭ 180 (-21.4%)
Mutual labels:  vcf
16gt
Simultaneous detection of SNPs and Indels using a 16-genotype probabilistic model
Stars: ✭ 26 (-88.65%)
Mutual labels:  vcf
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-55.9%)
Mutual labels:  vcf
Hail
Scalable genomic data analysis.
Stars: ✭ 706 (+208.3%)
Mutual labels:  vcf
Gvanno
Generic germline variant annotation pipeline
Stars: ✭ 23 (-89.96%)
Mutual labels:  vcf
Truvari
Structural variant toolkit for VCFs
Stars: ✭ 85 (-62.88%)
Mutual labels:  vcf
Vcard
This vCard PHP library can easily parse or generate/export vCards as .vcf
Stars: ✭ 333 (+45.41%)
Mutual labels:  vcf
Biosyntax
Syntax highlighting for computational biology
Stars: ✭ 164 (-28.38%)
Mutual labels:  vcf
Vcfanno
annotate a VCF with other VCFs/BEDs/tabixed files
Stars: ✭ 259 (+13.1%)
Mutual labels:  vcf
Mixerp.net.vcards
vCard Serializer and Parser for C#
Stars: ✭ 56 (-75.55%)
Mutual labels:  vcf
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (-3.93%)
Mutual labels:  vcf
Pcgr
Personal Cancer Genome Reporter (PCGR)
Stars: ✭ 168 (-26.64%)
Mutual labels:  vcf
Paragraph
Graph realignment tools for structural variants
Stars: ✭ 92 (-59.83%)
Mutual labels:  vcf

vcfmaf

To convert a VCF into a MAF, each variant must be mapped to only one of all possible gene transcripts/isoforms that it might affect. But even within a single isoform, a Missense_Mutation close enough to a Splice_Site, can be labeled as either in MAF format, but not as both. This selection of a single effect per variant, is often subjective. And that's what this project attempts to standardize. The vcf2maf and maf2maf scripts leave most of that responsibility to Ensembl's VEP, but allows you to override their "canonical" isoforms, or use a custom ExAC VCF for annotation. Though the most useful feature is the extensive support in parsing a wide range of crappy MAF-like or VCF-like formats we've seen out in the wild.

Build Status

Quick start

Find the latest stable release, download it, and view the detailed usage manuals for vcf2maf and maf2maf:

export VCF2MAF_URL=`curl -sL https://api.github.com/repos/mskcc/vcf2maf/releases | grep -m1 tarball_url | cut -d\" -f4`
curl -L -o mskcc-vcf2maf.tar.gz $VCF2MAF_URL; tar -zxf mskcc-vcf2maf.tar.gz; cd mskcc-vcf2maf-*
perl vcf2maf.pl --man
perl maf2maf.pl --man

If you don't have VEP installed, then follow this gist. Of the many annotators out there, VEP is preferred for its large team of active coders, and its CLIA-compliant HGVS formats. After installing VEP, test out vcf2maf like this:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf

To fill columns 16 and 17 of the output MAF with tumor/normal sample IDs, and to parse out genotypes and allele counts from matched genotype columns in the VCF, use options --tumor-id and --normal-id. Skip option --normal-id if you didn't have a matched normal:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --tumor-id WD1309 --normal-id NB1308

VCFs from variant callers like VarScan use hardcoded sample IDs TUMOR/NORMAL to name genotype columns. To have vcf2maf correctly locate the columns to parse genotypes, while still printing proper sample IDs in the output MAF:

perl vcf2maf.pl --input-vcf tests/test_varscan.vcf --output-maf tests/test_varscan.vep.maf --tumor-id WD1309 --normal-id NB1308 --vcf-tumor-id TUMOR --vcf-normal-id NORMAL

If VEP is installed under /opt/vep and the VEP cache is under /srv/vep, there are options available to tell vcf2maf where to find them:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --vep-path /opt/vep --vep-data /srv/vep

If you want to skip running VEP and need a minimalist MAF-like file listing data from the input VCF only, then use the --inhibit-vep option. If your input VCF contains VEP annotation, then vcf2maf will try to extract it. But be warned that the accuracy of your resulting MAF depends on how VEP was operated upstream. In standard operation, vcf2maf runs VEP with very specific parameters to make sure everyone produces comparable MAFs. So, it is strongly recommended to avoid --inhibit-vep unless you know what you're doing.

maf2maf

If you have a MAF or a MAF-like file that you want to reannotate, then use maf2maf, which simply runs maf2vcf followed by vcf2maf:

perl maf2maf.pl --input-maf tests/test.maf --output-maf tests/test.vep.maf

After tests on variant lists from many sources, maf2vcf and maf2maf are quite good at dealing with formatting errors or "MAF-like" files. It even supports VCF-style alleles, as long as Start_Position == POS. But it's OK if the input format is imperfect. Any variants with a reference allele mismatch are kept aside in a separate file for debugging. The bare minimum columns that maf2maf expects as input are:

Chromosome	Start_Position	Reference_Allele	Tumor_Seq_Allele2	Tumor_Sample_Barcode
1	3599659	C	T	TCGA-A1-A0SF-01
1	6676836	A	AGC	TCGA-A1-A0SF-01
1	7886690	G	A	TCGA-A1-A0SI-01

See data/minimalist_test_maf.tsv for a sampler. Addition of Tumor_Seq_Allele1 will be used to determine zygosity. Otherwise, it will try to determine zygosity from variant allele fractions, assuming that arguments --tum-vad-col and --tum-depth-col are set correctly to the names of columns containing those read counts. Specifying the Matched_Norm_Sample_Barcode with its respective columns containing read-counts, is also strongly recommended. Columns containing normal allele read counts can be specified using argument --nrm-vad-col and --nrm-depth-col.

License

Apache-2.0 | Apache License, Version 2.0 | https://www.apache.org/licenses/LICENSE-2.0

Citation

Cyriac Kandoth. mskcc/vcf2maf: vcf2maf v1.6.19. (2020). doi:10.5281/zenodo.593251
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].