All Projects → johnlees → mandrake

johnlees / mandrake

Licence: Apache-2.0 and 2 other licenses found Licenses found Apache-2.0 LICENSE BSD-3-Clause LICENSE_SCE MIT LICENSE_kseq
Mandrake 🌿/👨‍🔬🦆 – Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding

Programming Languages

python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Cuda
1817 projects
c
50402 projects - #5 most used programming language
CMake
9771 projects
Makefile
30231 projects

Projects that are alternatives of or similar to mandrake

simuG
simuG: a general-purpose genome simulator
Stars: ✭ 68 (+134.48%)
Mutual labels:  genomics
bigly
a pileup library that embraces the huge
Stars: ✭ 38 (+31.03%)
Mutual labels:  genomics
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+255.17%)
Mutual labels:  genomics
biopython-coronavirus
Biopython Jupyter Notebook tutorial to characterize a small genome
Stars: ✭ 80 (+175.86%)
Mutual labels:  genomics
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+34.48%)
Mutual labels:  genomics
mity
mity: A highly sensitive mitochondrial variant analysis pipeline for whole genome sequencing data
Stars: ✭ 27 (-6.9%)
Mutual labels:  genomics
genipe
Genome-wide imputation pipeline
Stars: ✭ 28 (-3.45%)
Mutual labels:  genomics
genoiser
use the noise
Stars: ✭ 15 (-48.28%)
Mutual labels:  genomics
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+527.59%)
Mutual labels:  genomics
psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Stars: ✭ 121 (+317.24%)
Mutual labels:  genomics
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (+20.69%)
Mutual labels:  genomics
sample
Performs memory-efficient reservoir sampling on very large input files delimited by newlines
Stars: ✭ 61 (+110.34%)
Mutual labels:  genomics
LRSDAY
LRSDAY: Long-read Sequencing Data Analysis for Yeasts
Stars: ✭ 26 (-10.34%)
Mutual labels:  genomics
PHIST
Phage-Host Interaction Search Tool
Stars: ✭ 19 (-34.48%)
Mutual labels:  genomics
DriverPower
DriverPower
Stars: ✭ 22 (-24.14%)
Mutual labels:  genomics
bfc
High-performance error correction for Illumina resequencing data
Stars: ✭ 66 (+127.59%)
Mutual labels:  genomics
event-embedding-multitask
*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach
Stars: ✭ 22 (-24.14%)
Mutual labels:  embedding
instaGRAAL
Large genome reassembly based on Hi-C data, continuation of GRAAL
Stars: ✭ 32 (+10.34%)
Mutual labels:  genomics
genepattern-server
The GenePattern Server web application
Stars: ✭ 26 (-10.34%)
Mutual labels:  genomics
DRAM
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 159 (+448.28%)
Mutual labels:  genomics

mandrake

Build and run tests Anaconda package Documentation Status

Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding.

Paper:

Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philosophical Transactions of The Royal Society B. 2022;377: 20210237.

https://doi.org/10.1098/rstb.2021.0237

Documentation available at: https://mandrake.readthedocs.io/en/latest/

Installation (briefly)

See https://mandrake.readthedocs.io/en/latest/installation.html for more details.

  1. Install miniconda.
  2. Run conda create -n mandrake_env mandrake to install into a clean environment.
  3. Run conda activate mandrake_env to use the environment.

Refer to the conda-forge documentation if you want to install a CUDA (GPU) enabled version.

Semi-manual

You will need some dependencies, which you can install through conda:

conda create -n mandrake_env python
conda env update -n mandrake_env --file environment.yml
conda activate mandrake_env

You can then clone this repository, and run:

python setup.py install

GPU acceleration

You will need the CUDA toolkit installed.

If you have the ability to compile CUDA (e.g. nvcc) you should see a message:

CUDA found, compiling both GPU and CPU code

otherwise only the CPU version will be compiled:

CUDA not found, compiling CPU code only

Usage

After installing, an example command would look like this:

mandrake --sketches sketchlib.h5 --kNN 500 --cpus 4 --maxIter 1000000

This would use a file sketchlib.h5 created by pp-sketchlib to calculate accessory distances using 500 nearest neighbours.

Output can be found in numerous files prefixed mandrake.embedding*.

Other useful arguments include:

  • --alignment use a fasta alignment to calculate distances
  • --accessory use a presence/absence file (Rtab or similar) to calculate distances
  • --distances use a .npz file from a previous run and skip straight to the embedding step
  • --labels give labels to colour the output by
  • --perplexity change the perplexity of the preprocessing (similar to t-SNE)
  • --animate produce a video of the optimisation
  • --use-gpu use a GPU for the run. Make sure to increase --n-workers.

See the documentation for more details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].