All Projects → edgardomortiz → Vcf2phylip

edgardomortiz / Vcf2phylip

Licence: gpl-3.0
Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Vcf2phylip

Awesome React Generator
No more clicking around to create files in your react project! Awesome React Generator is Command Line Tool that let's you scaffold your components without leaving your terminal.
Stars: ✭ 98 (-22.22%)
Mutual labels:  binary
Stfan
Code repo for "Spatio-Temporal Filter Adaptive Network for Video Deblurring" (ICCV'19)
Stars: ✭ 110 (-12.7%)
Mutual labels:  alignment
Corrode
A batteries-included library for reading binary data.
Stars: ✭ 116 (-7.94%)
Mutual labels:  binary
Awesome Image Alignment And Stitching
A curated list of awesome resources for image alignment and stitching ...
Stars: ✭ 101 (-19.84%)
Mutual labels:  alignment
Nexe
🎉 create a single executable out of your node.js apps
Stars: ✭ 10,565 (+8284.92%)
Mutual labels:  binary
Bepasty Server
binary pastebin server
Stars: ✭ 111 (-11.9%)
Mutual labels:  binary
Binarykit
💾🔍🧮 BinaryKit helps you to break down binary data into bits and bytes, easily access specific parts and write data to binary.
Stars: ✭ 92 (-26.98%)
Mutual labels:  binary
Go Wasm
WebAssembly binary file parser written in go
Stars: ✭ 121 (-3.97%)
Mutual labels:  binary
Sortmerna
SortMeRNA: next-generation sequence filtering and alignment tool
Stars: ✭ 108 (-14.29%)
Mutual labels:  alignment
Epk2extract
Extraction tool for LG, Hisense, Sharp, Philips/TPV, Thompson and similar TVs/Embedded Devices
Stars: ✭ 115 (-8.73%)
Mutual labels:  binary
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-19.84%)
Mutual labels:  vcf
Render
Go package for easily rendering JSON, XML, binary data, and HTML templates responses.
Stars: ✭ 1,562 (+1139.68%)
Mutual labels:  binary
Safe
SAFE: Self-Attentive Function Embeddings for binary similarity
Stars: ✭ 112 (-11.11%)
Mutual labels:  binary
Bytearray.js
An equivalent to Actionscript 3's ByteArray for Javascript with AMF0 and AMF3 support.
Stars: ✭ 100 (-20.63%)
Mutual labels:  binary
Bssom.net
A small, high performance, powerful serializer using bssom binary protocol
Stars: ✭ 117 (-7.14%)
Mutual labels:  binary
Bam
The Binary Analysis Metadata tool gathers information about Windows binaries to aid in their analysis. #nsacyber
Stars: ✭ 93 (-26.19%)
Mutual labels:  binary
Strings
A set of useful functions for transforming strings.
Stars: ✭ 111 (-11.9%)
Mutual labels:  alignment
Snodge
Randomly mutate JSON, XML, HTML forms, text and binary data for fuzz testing
Stars: ✭ 121 (-3.97%)
Mutual labels:  binary
Formatfuzzer
FormatFuzzer is a framework for high-efficiency, high-quality generation and parsing of binary inputs.
Stars: ✭ 117 (-7.14%)
Mutual labels:  binary
3ddfa v2
The official PyTorch implementation of Towards Fast, Accurate and Stable 3D Dense Face Alignment, ECCV 2020.
Stars: ✭ 1,961 (+1456.35%)
Mutual labels:  alignment

vcf2phylip

DOI
Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis

Brief description

This script takes as input a VCF file and will use the SNP genotypes to create a matrix for phylogenetic analysis in the PHYLIP (relaxed version), FASTA, NEXUS, or binary NEXUS formats. For heterozygous SNPs the consensus is made and the IUPAC nucleotide ambiguity codes are written to the final matrix(ces), any ploidy level is allowed and automatically detected. The code is optimized for large VCF matrices (hundreds of samples and millions of genotypes), for example, in our tests it processed a 20GB VCF (~3 million SNPs x 650 individuals) in ~27 minutes. The initial version of the script just produced a PHYLIP matrix but now we have added other popular formats, including the binary NEXUS file to run SNPs analysis with the SNAPP plugin in BEAST (only for diploid genotypes).

Additionally, you can choose a minimum number of samples per SNP to control the final amount of missing data. Since phylogenetic software usually root the trees at the first sequence in the alignment (e.g. RAxML, IQTREE, and MrBayes), the script also allows you to specify an OUTGROUP sequence that will be written in the first place in the alignment.

Compressed VCF files can be directly analyzed but the extension must be .vcf.gz.

The script has been tested with VCF files produced by pyrad v.3.0.66, ipyrad v.0.7.x, Stacks v.1.47, dDocent, GATK, and freebayes.

Please don't hesitate to open an Issue if you find any problem or suggestions for a new feature.

Usage

Just type python vcf2phylip.py -h to show the help of the program:

usage: vcf2phylip.py [-h] -i FILENAME [--output-folder FOLDER]
                     [--output-prefix PREFIX] [-m MIN_SAMPLES_LOCUS]
                     [-o OUTGROUP] [-p] [-f] [-n] [-b] [-r] [-v]

The script converts a collection of SNPs in VCF format into a PHYLIP, FASTA,
NEXUS, or binary NEXUS file for phylogenetic analysis. The code is optimized
to process VCF files with sizes >1GB. For small VCF files the algorithm slows
down as the number of taxa increases (but is still fast).

Any ploidy is allowed, but binary NEXUS is produced only for diploid VCFs.

optional arguments:
  -h, --help            show this help message and exit
  -i FILENAME, --input FILENAME
                        Name of the input VCF file, can be gzipped
  --output-folder FOLDER
                        Output folder name, it will be created if it does not
                        exist (same folder as input by default)
  --output-prefix PREFIX
                        Prefix for output filenames (same as the input VCF
                        filename without the extension by default)
  -m MIN_SAMPLES_LOCUS, --min-samples-locus MIN_SAMPLES_LOCUS
                        Minimum of samples required to be present at a locus
                        (default=4)
  -o OUTGROUP, --outgroup OUTGROUP
                        Name of the outgroup in the matrix. Sequence will be
                        written as first taxon in the alignment.
  -p, --phylip-disable  A PHYLIP matrix is written by default unless you
                        enable this flag
  -f, --fasta           Write a FASTA matrix (disabled by default)
  -n, --nexus           Write a NEXUS matrix (disabled by default)
  -b, --nexus-binary    Write a binary NEXUS matrix for analysis of biallelic
                        SNPs in SNAPP, only diploid genotypes will be
                        processed (disabled by default)
  -r, --resolve-IUPAC   Randomly resolve heterozygous genotypes to avoid IUPAC
                        ambiguities in the matrices (disabled by default)
  -v, --version         show program's version number and exit

Examples

In the following examples you can omit python if you change the permissions of vcf2phylip.py to executable.

Example 1: Use default parameters to create a PHYLIP matrix with a minimum of 4 samples per SNP:

python vcf2phylip.py --input myfile.vcf
# Which i equivalent to:
python vcf2phylip.py -i myfile.vcf
# This command will create a PHYLIP called myfile_min4.phy

Example 2: Create a PHYLIP and a FASTA matrix using a minimum of 60 samples per SNP:

python vcf2phylip.py --input myfile.vcf --fasta --min-samples-locus 60
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -f -m 60
# This command will create a PHYLIP called myfile_min60.phy and a FASTA called myfile_min60.fasta

Example 3: Create all output formats, and select "sample1" as outgroup:

python vcf2phylip.py --input myfile.vcf --outgroup sample1 --fasta --nexus --nexus-binary
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -o sample1 -f -n -b
# This command will create a PHYLIP called myfile_min4.phy, a FASTA called myfile_min4.fasta, a NEXUS called myfile_min4.nexus, and a binary NEXUS called myfile_min4.bin.nexus

Example 4: If, for example, you wish to disable the creation of the PHYLIP matrix and only create a NEXUS matrix:

python vcf2phylip.py --input myfile.vcf --phylip-disable --nexus
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -p -n
# This command will create only a NEXUS matrix called myfile_min4.nexus

Example 5: If for some reason you don't want to have IUPAC ambiguities representing heterozygous genotypes:

python vcf2phylip.py --input myfile.vcf --resolve-IUPAC
# Which is equivalent to:
python vcf2phylip.py -i myfile.vcf -r
# This command will create only a PHYLIP matrix called myfile_min4.phy where IUPAC ambiguites have been randomly resolved

Example 6: Specify output folder and output prefix:

python vcf2phylip.py -i myfile.vcf.gz --output-folder /data/results --output-prefix mymatrix
# This command will create the file `mymatrix.min4.phy` in the folder `/data/results`

Credits

Citation

DOI
Ortiz, E.M. 2019. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis. DOI:10.5281/zenodo.2540861

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].