All Projects → charite → jannovar

charite / jannovar

Licence: other
Annotation of VCF variants with functional impact and from databases (executable+library)

Programming Languages

java
68154 projects - #9 most used programming language
HTML
75241 projects
ANTLR
299 projects
python
139335 projects - #7 most used programming language
Makefile
30231 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to jannovar

vcf2tsv
Genomic VCF to tab-separated values
Stars: ✭ 27 (-35.71%)
Mutual labels:  vcf, variant-filtration
cpsr
Cancer Predisposition Sequencing Reporter (CPSR)
Stars: ✭ 44 (+4.76%)
Mutual labels:  vcf
csv2vcf
🔧 Simple script in python to convert CSV files to VCF
Stars: ✭ 66 (+57.14%)
Mutual labels:  vcf
2vcf
convert 23andme or Ancestry.com raw genotype calls into VCF format, with dbSNP annotations
Stars: ✭ 42 (+0%)
Mutual labels:  vcf
vembrane
vembrane filters VCF records using python expressions
Stars: ✭ 46 (+9.52%)
Mutual labels:  vcf
spark-vcf
Spark VCF data source implementation for Dataframes
Stars: ✭ 15 (-64.29%)
Mutual labels:  vcf
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+102.38%)
Mutual labels:  vcf
learning vcf file
Learning the Variant Call Format
Stars: ✭ 104 (+147.62%)
Mutual labels:  vcf
calcardbackup
calcardbackup: moved to https://codeberg.org/BernieO/calcardbackup
Stars: ✭ 67 (+59.52%)
Mutual labels:  vcf
indelope
find large indels (in the blind spot between GATK/freebayes and SV callers)
Stars: ✭ 38 (-9.52%)
Mutual labels:  vcf
laravel-vcard
A fluent builder class for vCard files.
Stars: ✭ 29 (-30.95%)
Mutual labels:  vcf
vcf stuff
📊Evaluating, filtering, comparing, and visualising VCF
Stars: ✭ 19 (-54.76%)
Mutual labels:  vcf
vcfstats
Powerful statistics for VCF files
Stars: ✭ 32 (-23.81%)
Mutual labels:  vcf
Variants2Neoantigen
A neoantigen calling pipeline begins from variants record file (MAF) (Not maintain now)
Stars: ✭ 27 (-35.71%)
Mutual labels:  vcf
ilus
A handy variant calling pipeline generator for whole genome re-sequencing (WGS) and whole exom sequencing data (WES) analysis. 一个简易且全面的 WGS/WES 分析流程生成器.
Stars: ✭ 64 (+52.38%)
Mutual labels:  vcf
bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-61.9%)
Mutual labels:  vcf
SNPGenie
Program for estimating πN/πS, dN/dS, and other diversity measures from next-generation sequencing data
Stars: ✭ 81 (+92.86%)
Mutual labels:  vcf
cutevariant
A standalone and free application to explore genetics variations from VCF file
Stars: ✭ 61 (+45.24%)
Mutual labels:  vcf
fuc
Frequently used commands in bioinformatics
Stars: ✭ 23 (-45.24%)
Mutual labels:  vcf
snps
tools for reading, writing, merging, and remapping SNPs
Stars: ✭ 57 (+35.71%)
Mutual labels:  vcf

Build Status Documentation API Docs Install with Bioconda Codacy Badge

Jannovar

Functional variant file annotation in Java. Jannovar provides a program for the annotation of VCF files and also exposes its functionality through a library API.

Also see the Quickstart section in the Jannovar manual.

In Brief

  • Language/Platform: Java >=8
  • License: BSD 3-Clause
  • Version: see Github side bar for current release
  • Availability:
    • Java command line tool jannovar-cli
    • Java libraries exposing most of jannovar-cli's functionality.

Databases

As of Jannovar version v0.36, we provide pre-built databases via Zenodo.

DOI

You can obtain pre-built databases Zenodo as shown from the following table. In the case that you need is missing, please start a Github discussion.

Organism Database DB release Reference File MD5 Sum
H. sapiens ENSEMBL 87 hg19 ensembl_87_hg19.ser ecaffeaa26531a002e75953c6b309c53
H. sapiens ENSEMBL 91 hg38 ensembl_91_hg38.ser 6218669555a52057ee88132edfed0ae2
H. sapiens RefSeq 105 hg19 refseq_105_hg19.ser b2087f8f3d41d20ad52fb9660853642e
H. sapiens RefSeq* 105 hg19 refseq_curated_105_hg19.ser a92fea7b8e37d46c75936783ae326d71
H. sapiens RefSeq* 105 rn6 refseq_curated_105_rn6.ser b028ae0e6768c0505b7a4d2fe89cd462
H. sapiens RefSeq 109 hg38 refseq_109_hg38.ser 6b1205bb534adb5ff9e0e569e6fabc5d
H. sapiens RefSeq* 109 hg38 refseq_curated_109_hg38.ser c2747c4c1b42a75930603d6deda105cf
M. musculus RefSeq 106 mm9 refseq_106_mm9.ser 1f7e2bf9860d06fab85225987fef3550
M. musculus RefSeq* 106 mm9 refseq_curated_106_mm9.ser 059bd7103dbf4014bebd2f900af7b36b
M. musculus RefSeq 108 mm10 refseq_108_mm10.ser a28e90913f74a9aba0c45650367f941c
M. musculus RefSeq* 108 mm10 refseq_curated_108_mm10.ser 1980725f909284c6ab8f8212dbe02dd9
M. musculus RefSeq 139 mm39 refseq_139_mm39.ser 6b1205bb534adb5ff9e0e569e6fabc5d
M. musculus RefSeq* 139 mm39 refseq_curated_139_mm39.ser c2747c4c1b42a75930603d6deda105cf
R. norvegicus RefSeq 105 rn6 refseq_105_rn6.ser 4a9c3416ee9159c0c71f613a3d168869

Note: RefSeq* = RefSeq with curated / NM_ transcript only and excluding XM_ transcripts that are based on gene predictions.

Note that files are compatible with both the NCBI and the UCSC genomes. E.g., the files for hg19 are compatible with the UCSC hg19 FASTA file and the GRCh37 files (e.g., hs37/hs37d5).

Database Compatibility

Jannovar database .ser files are compatible within a given version range with respect to the Jannovar version. The following table lists the compatibility.

First Version Last Version Notes
0.33 0.38 first version with compatibility description

Developer Guidelines

Style

  • Java code should follow IntelliJ default formatting and the Ctrl+Alt+l formatter. Eclipse users please use Eclipse Code Formatter. Enable the "wrap at right margin" option for JavaDoc.
  • For all other text, use .editorconfig.

Building Transcript Databases

For building Jannovar transcript database files (with .ser extension), you will need files from various sources. These include the actual transcript databases from RefSeq, ENSEMBL, UCSC etc. But you will also helper files for mapping between gene names and symbols from HGNC and information regarding contig sequence identifiers from NCBI. It turned out that the upstream locations are unstable so we resolved in uploading the files to Zenodo as this offers stable identifiers. At the same time, this create challenges in versioning as, e.g., UCSC regularly publishes updates without giving out versions.

Downloading Raw Data Files

The script ./utils/download-raw.sh contains scripts to download raw data files from the original "upstream" locations. The files will go into ./data (ignored via .gitignore). The top level file directory is ./data/raw/bwa.3430-N1-DNA1-WGS1.bam.7z/ which contains the raw data files for building the database of name ${database}, in variant ${_variant} (e.g., refseq_curated) that for a given release and genome build. Everything below this will follow specific requirements of the given data base. The download-raw.sh script may also directly download data from Zenodo where applicable.

The script ./utils/gen-zenodo-raw.sh will prepare the previously downloaded raw data for upload to Zenodo. The files will go to ./data/zenodo-raw. Zenodo does not support folders so we fall back to introducing -- as flat file separators. Note well that uploading files twice to Zenodo just takes space on their storage systems and we don't have any mechanism in place to remove duplicates.

Building Databases

The script ./utils/build-dbs.sh will generate the databases for the current Jannovar version. The files will go into ./data/jannovar-data-${jannovar_version}. These files can also go to Zenodo. For now, we will curate links to the files in the README.md file for each version. Note that not each Jannovar version will require rebuilding the databases. The currently needed latest version is given in JannovarDataSerializer.minVersion. Upload to Zenodo and curation of databases is currently manual work.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].