All Projects → GenomicsDB → GenomicsDB

GenomicsDB / GenomicsDB

Licence: other
Highly performant data storage in C++ for importing, querying and transforming variant data with C/C++/Java/Spark bindings. Used in gatk4.

Programming Languages

C++
36643 projects - #6 most used programming language
java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language
CMake
9771 projects
shell
77523 projects
scala
5932 projects

Projects that are alternatives of or similar to GenomicsDB

Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+54.55%)
Mutual labels:  genomics, variant-calling
cerebra
A tool for fast and accurate summarizing of variant calling format (VCF) files
Stars: ✭ 55 (-28.57%)
Mutual labels:  genomics, variant-calling
HLA
xHLA: Fast and accurate HLA typing from short read sequence data
Stars: ✭ 84 (+9.09%)
Mutual labels:  genomics, variant-calling
fermikit
De novo assembly based variant calling pipeline for Illumina short reads
Stars: ✭ 98 (+27.27%)
Mutual labels:  genomics, variant-calling
dysgu
dysgu-SV is a collection of tools for calling structural variants using short or long reads
Stars: ✭ 47 (-38.96%)
Mutual labels:  genomics, variant-calling
BALSAMIC
Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
Stars: ✭ 29 (-62.34%)
Mutual labels:  genomics, variant-calling
sentieon-dnaseq
Sentieon DNAseq
Stars: ✭ 18 (-76.62%)
Mutual labels:  variant-calling, genomics-data
indelope
find large indels (in the blind spot between GATK/freebayes and SV callers)
Stars: ✭ 38 (-50.65%)
Mutual labels:  genomics, variant-calling
arcsv
Complex structural variant detection from WGS data
Stars: ✭ 16 (-79.22%)
Mutual labels:  genomics, variant-calling
mccortex
De novo genome assembly and multisample variant calling
Stars: ✭ 105 (+36.36%)
Mutual labels:  genomics, variant-calling
Dragonn
A toolkit to learn how to model and interpret regulatory sequence data using deep learning.
Stars: ✭ 222 (+188.31%)
Mutual labels:  genomics
Cyvcf2
cython + htslib == fast VCF and BCF processing
Stars: ✭ 243 (+215.58%)
Mutual labels:  genomics
azurehpc
This repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Stars: ✭ 102 (+32.47%)
Mutual labels:  mpi
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+185.71%)
Mutual labels:  genomics
Bowtie
An ultrafast memory-efficient short read aligner
Stars: ✭ 221 (+187.01%)
Mutual labels:  genomics
Mitty
Seven Bridges Genomics aligner/caller debugging and analysis tools
Stars: ✭ 13 (-83.12%)
Mutual labels:  genomics
Pyranges
Performant Pythonic GenomicRanges
Stars: ✭ 219 (+184.42%)
Mutual labels:  genomics
Miniasm
Ultrafast de novo assembly for long noisy reads (though having no consensus step)
Stars: ✭ 216 (+180.52%)
Mutual labels:  genomics
berokka
🍊 💫 Trim, circularise and orient long read bacterial genome assemblies
Stars: ✭ 23 (-70.13%)
Mutual labels:  genomics
api-spec
API Specififications
Stars: ✭ 30 (-61.04%)
Mutual labels:  mpi

License: MIT Maven Central

Master Develop
actions actions
codecov codecov

GenomicsDB, originally from Intel Health and Lifesciences, is built on top of a fork of htslib and a tile-based array storage system for importing, querying and transforming variant data. Variant data is sparse by nature (sparse relative to the whole genome) and using sparse array data stores is a perfect fit for storing such data. GenomicsDB is a highly performant scalable data storage written in C++ for importing, querying and transforming genomic variant data.

  • Supported platforms : Linux and MacOS.
  • Supported filesystems : POSIX, HDFS, EMRFS(S3), GCS and Azure Blob.

Included are

  • JVM/Spark wrappers that allow for streaming VariantContext buffers to/from the C++ layer among other functions. GenomicsDB jars with native libraries and only zlib dependencies are regularly published on Maven Central.
  • Native tools for incremental ingestion of variants in the form of VCF/BCF/CSV into GenomicsDB for performance.
  • MPI and Spark support for parallel querying of GenomicsDB.

GenomicsDB is packaged into gatk4 and benefits qualitatively from a large user base.

The GenomicsDB documentation for users is hosted as a Github wiki: https://github.com/GenomicsDB/GenomicsDB/wiki

External Contributions

GenomicsDB is open source and all participation is welcome. GenomicsDB is released under the MIT License and all external contributors are expected to grant an MIT License for their contributions.

Checklist before creating Pull Request

Please ensure that the code is well documented in Javadoc style for Java/Scala. For C/C++ code, roughly adhere to Google C++ Style for consistency/readabilty.

Use spaces instead of tabs.
Use 2 spaces for indenting.
Add brackets even for one line blocks e.g. 
        if (x>0)
           do_foo();
 should ideally be 
       if (x>0) {
         do_foo();
       }
Pad header e.g.
        if(x>0) should be if (x>0)
        while(x>0) should be while (x>0)
One half indent for class modifiers.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].