All Projects → aldro61 → kover

aldro61 / kover

Licence: GPL-3.0 license
Learn interpretable computational phenotyping models from k-merized genomic data

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to kover

phenol
phenol: Phenotype ontology library
Stars: ✭ 15 (-68.09%)
Mutual labels:  genomics, phenotypes
kmer-db
Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
Stars: ✭ 68 (+44.68%)
Mutual labels:  genomics, k-mer
BALSAMIC
Bioinformatic Analysis pipeLine for SomAtic Mutations In Cancer
Stars: ✭ 29 (-38.3%)
Mutual labels:  genomics
gosling.js
Grammar of Scalable Linked Interactive Nucleotide Graphics
Stars: ✭ 89 (+89.36%)
Mutual labels:  genomics
macrel
Predict AMPs in (meta)genomes and peptides
Stars: ✭ 34 (-27.66%)
Mutual labels:  genomics
dee2
Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
Stars: ✭ 32 (-31.91%)
Mutual labels:  genomics
adapt
A package for designing activity-informed nucleic acid diagnostics for viruses.
Stars: ✭ 16 (-65.96%)
Mutual labels:  genomics
graphsim
R package: Simulate Expression data from igraph network using mvtnorm (CRAN; JOSS)
Stars: ✭ 16 (-65.96%)
Mutual labels:  genomics
Assemblytics
Assemblytics is a bioinformatics tool to detect and analyze structural variants from a genome assembly by comparing it to a reference genome.
Stars: ✭ 105 (+123.4%)
Mutual labels:  genomics
HumanIdiogramLibrary
Resource of human chromosome schematics & images
Stars: ✭ 76 (+61.7%)
Mutual labels:  genomics
HPO-translations
Internationalisation of the HPO content
Stars: ✭ 19 (-59.57%)
Mutual labels:  phenotypes
indelope
find large indels (in the blind spot between GATK/freebayes and SV callers)
Stars: ✭ 38 (-19.15%)
Mutual labels:  genomics
tidygenomics
Tidy Verbs for Dealing with Genomic Data Frames https://const-ae.github.io/tidygenomics/
Stars: ✭ 97 (+106.38%)
Mutual labels:  genomics
bxtools
Tools for analyzing 10X Genomics data
Stars: ✭ 39 (-17.02%)
Mutual labels:  genomics
TADLib
A Library to Explore Chromatin Interaction Patterns for Topologically Associating Domains
Stars: ✭ 23 (-51.06%)
Mutual labels:  genomics
iMOKA
interactive Multi Objective K-mer Analysis
Stars: ✭ 19 (-59.57%)
Mutual labels:  k-mer
fq
Command line utility for manipulating Illumina-generated FastQ files.
Stars: ✭ 31 (-34.04%)
Mutual labels:  genomics
gnomix
A fast, scalable, and accurate local ancestry method.
Stars: ✭ 36 (-23.4%)
Mutual labels:  genomics
MindTheGap
MindTheGap is a SV caller for short read sequencing data dedicated to insertion variants (all sizes and types). It can also be used as a local assembly tool.
Stars: ✭ 30 (-36.17%)
Mutual labels:  genomics
mgatk
mgatk: mitochondrial genome analysis toolkit
Stars: ✭ 65 (+38.3%)
Mutual labels:  genomics

2.0

DOI Build Status

Kover is an out-of-core implementation of rule-based machine learning algorithms that has been tailored for genomic biomarker discovery. It produces highly interpretable models, based on k-mers, that explicitly highlight genotype-to-phenotype associations.

Introduction

Understanding the relationship between the genome of a cell and its phenotype is a central problem in precision medicine. Nonetheless, genotype-to-phenotype prediction comes with great challenges for machine learning algorithms that limit their use in this setting. The high dimensionality of the data tends to hinder generalization and challenges the scalability of most learning algorithms. Additionally, most algorithms produce models that are complex and difficult to interpret. We alleviate these limitations by proposing strong performance guarantees, based on sample compression theory, for rule-based learning algorithms that produce highly interpretable models. We show that these guarantees can be leveraged to accelerate learning and improve model interpretability. Our approach is validated through an application to the genomic prediction of antimicrobial resistance, an important public health concern. Highly accurate models were obtained for 12 species and 56 antibiotics, and their interpretation revealed known resistance mechanisms, as well as some potential new ones. An open-source disk-based implementation that is both memory and computationally efficient is included with this work. The implementation is turnkey, requires no prior knowledge of machine learning, and is complemented by comprehensive tutorials.

Drouin, A., Letarte, G., Raymond, F., Marchand, M., Corbeil, J., & Laviolette, F. (2019). Interpretable genotype-to-phenotype classifiers with performance guarantees. Scientific Reports, 9(1), 4071. [PDF]

Drouin, A., Giguère, S., Déraspe, M., Marchand, M., Tyers, M., Loo, V. G., Bourgault, A. M., Laviolette, F. & Corbeil, J. (2016). Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons. BMC Genomics, 17(1), 754. [PDF]

Video lecture:

The Set Covering Machine implementation in Kover was featured in the following video lecture:

Interpretable Models of Antibiotic Resistance with the Set Covering Machine Algorithm, Google, Cambridge, Massachusetts (February 2017)

Google tech talk

Installation

You can use either of the following options:

Tutorials

For tutorials on how to use Kover with your data, see: http://aldro61.github.io/kover/doc_tutorials.html

Documentation

The documentation can be found at: http://aldro61.github.io/kover/

Contact

If you need help using Kover, please use Biostars. To report a bug, please create an issue on GitHub.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].