All Projects → sigven → vcf2tsv

sigven / vcf2tsv

Licence: MIT license
Genomic VCF to tab-separated values

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to vcf2tsv

jannovar
Annotation of VCF variants with functional impact and from databases (executable+library)
Stars: ✭ 42 (+55.56%)
Mutual labels:  vcf, variant-filtration
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+22788.89%)
Mutual labels:  tsv, conversion
tabulator
A set of Unix shell command line tools for quick and convenient batch processing of tabular text files (a.k.a., tab-delimited, tsv, csv, or flat data file format) with a header line. Provides column reference by name, automatic delimiter and compression detection for per-line transformations, sql-like group-by operation and relational join.
Stars: ✭ 34 (+25.93%)
Mutual labels:  tsv, tab-separated
tensorflow-tensorrt
Tensorflow to TensorRT Model Converter
Stars: ✭ 30 (+11.11%)
Mutual labels:  conversion
open-geo-data-education
Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in t…
Stars: ✭ 52 (+92.59%)
Mutual labels:  tsv
bidscoin
BIDScoin converts your source-level neuroimaging data to BIDS
Stars: ✭ 75 (+177.78%)
Mutual labels:  conversion
rdf2x
RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
Stars: ✭ 43 (+59.26%)
Mutual labels:  conversion
vcfstats
Powerful statistics for VCF files
Stars: ✭ 32 (+18.52%)
Mutual labels:  vcf
converjon
An advanced image conversion server and command line tool.
Stars: ✭ 52 (+92.59%)
Mutual labels:  conversion
ilus
A handy variant calling pipeline generator for whole genome re-sequencing (WGS) and whole exom sequencing data (WES) analysis. 一个简易且全面的 WGS/WES 分析流程生成器.
Stars: ✭ 64 (+137.04%)
Mutual labels:  vcf
retropixels
A cross platform tool to convert images to c64 format.
Stars: ✭ 78 (+188.89%)
Mutual labels:  conversion
Html2Pdf
Convert Html to Pdf in Android
Stars: ✭ 25 (-7.41%)
Mutual labels:  conversion
physikal
Mirror of Gitlab Repository
Stars: ✭ 33 (+22.22%)
Mutual labels:  conversion
indelope
find large indels (in the blind spot between GATK/freebayes and SV callers)
Stars: ✭ 38 (+40.74%)
Mutual labels:  vcf
coers
A small library for coercion to primitive Erlang types.
Stars: ✭ 23 (-14.81%)
Mutual labels:  conversion
snps
tools for reading, writing, merging, and remapping SNPs
Stars: ✭ 57 (+111.11%)
Mutual labels:  vcf
html-to-react
A lightweight library that converts raw HTML to a React DOM structure.
Stars: ✭ 696 (+2477.78%)
Mutual labels:  conversion
dicomifier
A medical image converter
Stars: ✭ 22 (-18.52%)
Mutual labels:  conversion
cpsr
Cancer Predisposition Sequencing Reporter (CPSR)
Stars: ✭ 44 (+62.96%)
Mutual labels:  vcf
mtgsqlive
MTGJSON build scripts to generate alternative data formats
Stars: ✭ 40 (+48.15%)
Mutual labels:  conversion

Genomic VCF to tab-separated values

Python script for conversion of VCF data to tab-separated values (TSV)

A small script that converts genomic variant data encoded in VCF format into a tab-separated values file. The script utilizes brentp/cyvcf2 to parse the VCF file. By default, the program prints the fixed VCF columns, all INFO tag values (as defined in the VCF header, INFO tags not present in a given record is appended with a '.'), and all genotype data (FORMAT columns) for heterozygotes and homozygotes. If genotype data is present, it prints one line per sample, and a column denoted VCF_SAMPLE_ID indicates data for a given sample. The script has optional arguments to

  • skip sample genotype data (i.e. FORMAT colums)
  • keep rejected genotypes (i.e. FILTER != 'PASS' / GT == './.')
  • skip INFO data.
  • compress output TSV
  • print data types of VCF columns as a header line

IMPORTANT: If you run vcf2tsv with a large multi-sample VCF file, the size of the output TSV will quickly grow large, since there is one line per sample genotype in the output by default. Turn on --skip_genotype_data if you are primarily interested in the variant INFO elements, filesize of output will also be considerably smaller.

Installation

Running vcf2tsv requires Python 3. It also requires that you have cyvcf2 and numpy installed. This can be achieved through the use of pip.

Usage:

usage: vcf2tsv.py [-h] [--skip_info_data] [--skip_genotype_data]
	   [--keep_rejected_calls] [--print_data_type_header]
	   [--compress]
	   query_vcf out_tsv

Convert a VCF file with genomic variants to a file with tab-separated values
(TSV). One entry (TSV line) per sample genotype

positional arguments:
query_vcf             Bgzipped input VCF file with query variants
		    (SNVs/InDels)
out_tsv               Output TSV file with one line pr non-rejected sample
		    genotype (Variant, genotype and annotation data as
		    tab-separated values)

optional arguments:
-h, --help            show this help message and exit
--skip_info_data      Skip printing of data in INFO column (default: False)
--skip_genotype_data  Skip printing of genotype_data (FORMAT columns)
		    (default: False)
--keep_rejected_calls
		    Print data for rejected calls (default: False)
--print_data_type_header
		    Print a header line with data types of VCF annotations
		    (default: False)
--compress            Compress TSV file with gzip (default: False)
--version             show program's version number and exit
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].