All Projects → BIMSBbioinfo → Janggu

BIMSBbioinfo / Janggu

Licence: gpl-3.0
Deep learning infrastructure for bioinformatics

Projects that are alternatives of or similar to Janggu

Gcp For Bioinformatics
GCP Essentials for Bioinformatics Researchers
Stars: ✭ 95 (-45.4%)
Mutual labels:  jupyter-notebook, bioinformatics, genomics
Somalier
fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
Stars: ✭ 128 (-26.44%)
Mutual labels:  bioinformatics, genomics
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (-28.74%)
Mutual labels:  bioinformatics, genomics
Hifiasm
Hifiasm: a haplotype-resolved assembler for accurate Hifi reads
Stars: ✭ 134 (-22.99%)
Mutual labels:  bioinformatics, genomics
Hicexplorer
HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
Stars: ✭ 116 (-33.33%)
Mutual labels:  bioinformatics, genomics
Circlator
A tool to circularize genome assemblies
Stars: ✭ 121 (-30.46%)
Mutual labels:  bioinformatics, genomics
Octopus
Bayesian haplotype-based mutation calling
Stars: ✭ 131 (-24.71%)
Mutual labels:  bioinformatics, genomics
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-41.95%)
Mutual labels:  bioinformatics, genomics
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (-20.69%)
Mutual labels:  bioinformatics, genomics
Awesome Bioinformatics Benchmarks
A curated list of bioinformatics bench-marking papers and resources.
Stars: ✭ 142 (-18.39%)
Mutual labels:  bioinformatics, genomics
Goleft
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
Stars: ✭ 175 (+0.57%)
Mutual labels:  bioinformatics, genomics
Ngless
NGLess: NGS with less work
Stars: ✭ 115 (-33.91%)
Mutual labels:  bioinformatics, genomics
Cooler
A cool place to store your Hi-C
Stars: ✭ 112 (-35.63%)
Mutual labels:  bioinformatics, genomics
Kmer Cnt
Code examples of fast and simple k-mer counters for tutorial purposes
Stars: ✭ 124 (-28.74%)
Mutual labels:  bioinformatics, genomics
Cgranges
A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example)
Stars: ✭ 111 (-36.21%)
Mutual labels:  bioinformatics, genomics
Hts Nim
nim wrapper for htslib for parsing genomics data files
Stars: ✭ 132 (-24.14%)
Mutual labels:  bioinformatics, genomics
Wgsim
Reads simulator
Stars: ✭ 178 (+2.3%)
Mutual labels:  bioinformatics, genomics
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (-44.83%)
Mutual labels:  bioinformatics, genomics
Smudgeplot
Inference of ploidy and heterozygosity structure using whole genome sequencing data
Stars: ✭ 98 (-43.68%)
Mutual labels:  bioinformatics, genomics
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (-22.41%)
Mutual labels:  bioinformatics, genomics

===================================== Janggu - Deep learning for Genomics

.. start-badges

.. image:: https://readthedocs.org/projects/janggu/badge/?style=flat :target: https://janggu.readthedocs.io/en/latest :alt: Documentation Status

.. image:: https://travis-ci.org/BIMSBbioinfo/janggu.svg?branch=master :alt: Travis-CI Build Status :target: https://travis-ci.org/BIMSBbioinfo/janggu

.. image:: https://codecov.io/github/BIMSBbioinfo/janggu/coverage.svg?branch=master :alt: Coverage Status :target: https://codecov.io/github/BIMSBbioinfo/janggu

.. image:: https://badge.fury.io/py/janggu.svg :alt: PyPI Package latest release :target: https://pypi.org/project/janggu

.. image:: https://img.shields.io/pypi/l/janggu.svg?color=green :alt: License :target: https://pypi.org/project/janggu

.. image:: https://img.shields.io/pypi/pyversions/janggu.svg :alt: Supported Python Versions :target: https://pypi.org/project/janggu/

.. image:: https://pepy.tech/badge/janggu :alt: Downloads :target: https://pepy.tech/project/janggu

.. end-badges

.. image:: jangguhex.png :width: 40% :alt: Janggu logo :align: center

Janggu is a python package that facilitates deep learning in the context of genomics. The package is freely available under a GPL-3.0 license.

.. image:: Janggu-visAbstract.png :width: 50% :alt: Janggu visual abstract :align: center

In particular, the package allows for easy access to typical Genomics data formats and out-of-the-box evaluation (for keras models specifically) so that you can concentrate on designing the neural network architecture for the purpose of quickly testing biological hypothesis. A comprehensive documentation is available here <https://janggu.readthedocs.io/en/latest>_.

Hallmarks of Janggu:

  1. Janggu provides special Genomics datasets that allow you to access raw data in FASTA, BAM, BIGWIG, BED and GFF file format.
  2. Various normalization procedures are supported for dealing with of the genomics dataset, including 'TPM', 'zscore' or custom normalizers.
  3. Biological features can be represented in terms of higher-order sequence features, e.g. di-nucleotide based features.
  4. The dataset objects are directly consumable with neural networks for example implemented using keras <https://keras.io>_ or using scikit-learn <https://scikit-learn.org/stable/index.html>_ (see src/examples in this repository).
  5. Numpy format output of a keras model can be converted to represent genomic coverage tracks, which allows exporting the predictions as BIGWIG files and visualization of genome browser-like plots.
  6. Genomic datasets can be stored in various ways, including as numpy array, sparse dataset or in hdf5 format.
  7. Caching of Genomic datasets avoids time consuming preprocessing steps and facilitates fast reloading.
  8. Janggu provides a wrapper for keras <https://keras.io>_ models with built-in logging functionality and automatized result evaluation.
  9. Janggu supports input feature importance attribution using the integrated gradients method and variant effect prediction assessment.
  10. Janggu provides a utilities such as keras layer for scanning both DNA strands for motif occurrences.

Getting started

Janggu makes it easy to access data from genomic file formats and utilize it for machine learning purposes.

.. code-block:: python

dna = Bioseq.create_from_genome('dna', refgenome=<refgenome.fa>, roi=<roi.bed>) labels = Cover.create_from_bed('labels', bedfiles=<labels.bed>, roi=<roi.bed>)

kerasmodel.fit(dna, labels)

A range of examples can be found in './src/examples' of this repository, which includes jupyter notebooks that illustrate Janggu's functionality and how it can be used with popular deep learning frameworks, including keras, sklearn or pytorch.

Why the name Janggu?

Janggu <https://en.wikipedia.org/wiki/Janggu>_ is a Korean percussion instrument that looks like an hourglass.

Like the two ends of the instrument, the philosophy of the Janggu package is to help with the two ends of a deep learning application in genomics, namely data acquisition and evaluation.

Installation

A list of python dependencies is defined in setup.py. Additionally, bedtools <https://bedtools.readthedocs.io/>_ is required for pybedtools which janggu depends on.

Janggu depends on tensorflow and keras. To install janggu with tensorflow version 1 and 2 use

::

to install with tensorflow==1.14 and keras==2.2

pip install janggu[tf] # or janggu[tf_gpu]

to install with tensorflow==2.2 and keras==2.4.3

pip install janggu[tf2] # or janggu[tf2_gpu]

Depending on the pip version (e.g. 20.2.2), some package dependencies may fail to be resolved accurately such that incompatible package versions are installed. If this is the case, you could try using pip install ... --use-feature=2020-resolver or install the required package version manually.

Alternatively, you can install tensorflow and keras via the conda environment using

::

tensorflow v1

conda install tensorflow==1.14 keras==2.2 # or tensorflow-gpu

tensorflow v2

conda install tensorflow==2.2 keras==2.4.3 # or tensorflow-gpu

Further information regarding the installation of tensorflow can be found on the official tensorflow webpage <https://www.tensorflow.org>_

To verify that the installation works try to run the example contained in the janggu package as follows

::

git clone https://github.com/BIMSBbioinfo/janggu cd janggu python ./src/examples/classify_fasta.py single

A model is then trained to predict the class labels of two sets of toy sequencesby scanning the forward strand for sequence patterns and using an ordinary mono-nucleotide one-hot sequence encoding. The entire training process takes a few minutes on CPU backend. Eventually, some example prediction scores are shown for Oct4 and Mafk sequences. The accuracy should be around 85% and individual example prediction scores should tend to be higher for Oct4 than for Mafk.

You may also try to rerun the training by evaluating sequences features on both strands and using higher-order sequence encoding using i.e. the command-line arguments: dnaconv -order 2. Accuracies and prediction scores for the individual example sequences should improve compared to the previous example.

Citation

| Kopp, W., Monti, R., Tamburrini, A., Ohler, U., Akalin, A. Deep learning for genomics using Janggu. Nat Commun 11, 3488 (2020). https://doi.org/10.1038/s41467-020-17155-y

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].