All Projects → gcorso → NeuroSEED

gcorso / NeuroSEED

Licence: MIT license
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch (NeurIPS 2021)

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
cython
566 projects

Projects that are alternatives of or similar to NeuroSEED

react-msa-viewer
React rerelease of MSAViewer
Stars: ✭ 15 (-62.5%)
Mutual labels:  bioinformatics, multiple-sequence-alignment
admixr
An R package for reproducible and automated ADMIXTOOLS analyses
Stars: ✭ 20 (-50%)
Mutual labels:  bioinformatics
unimap
A EXPERIMENTAL fork of minimap2 optimized for assembly-to-reference alignment
Stars: ✭ 76 (+90%)
Mutual labels:  bioinformatics
GRAFIMO
GRAph-based Finding of Individual Motif Occurrences
Stars: ✭ 22 (-45%)
Mutual labels:  bioinformatics
gnparser
GNparser normalises scientific names and extracts their semantic elements.
Stars: ✭ 26 (-35%)
Mutual labels:  bioinformatics
sirius
SIRIUS is a software for discovering a landscape of de-novo identification of metabolites using tandem mass spectrometry. This repository contains the code of the SIRIUS Software (GUI and CLI)
Stars: ✭ 32 (-20%)
Mutual labels:  bioinformatics
wgs2ncbi
Toolkit for preparing genomes for submission to NCBI
Stars: ✭ 25 (-37.5%)
Mutual labels:  bioinformatics
slamdunk
Streamlining SLAM-seq analysis with ultra-high sensitivity
Stars: ✭ 24 (-40%)
Mutual labels:  bioinformatics
codon-usage-tables
📊 Codon usage tables in code-friendly format + Python bindings
Stars: ✭ 21 (-47.5%)
Mutual labels:  bioinformatics
awesome-phages
A curated list of phage related software and computational resources for phage scientists, bioinformaticians and enthusiasts.
Stars: ✭ 14 (-65%)
Mutual labels:  bioinformatics
dysgu
dysgu-SV is a collection of tools for calling structural variants using short or long reads
Stars: ✭ 47 (+17.5%)
Mutual labels:  bioinformatics
sample-sheet
A permissively licensed library designed to replace Illumina's Experiment Manager
Stars: ✭ 42 (+5%)
Mutual labels:  bioinformatics
rkmh
Classify sequencing reads using MinHash.
Stars: ✭ 42 (+5%)
Mutual labels:  bioinformatics
TeamTeri
Genomics using open source tools, running on GCP or AWS
Stars: ✭ 30 (-25%)
Mutual labels:  bioinformatics
Binning refiner
Improving genome bins through the combination of different binning programs
Stars: ✭ 26 (-35%)
Mutual labels:  bioinformatics
PrimerMiner
R mased batch sequence downloader, with primer development and in silico evaluation capabilities
Stars: ✭ 27 (-32.5%)
Mutual labels:  bioinformatics
protwis
Protwis is the backbone of the GPCRdb. The GPCRdb contains reference data, interactive visualisation and experiment design tools for G protein-coupled receptors (GPCRs).
Stars: ✭ 20 (-50%)
Mutual labels:  bioinformatics
CeleScope
Single Cell Analysis Pipelines
Stars: ✭ 36 (-10%)
Mutual labels:  bioinformatics
hotsub
Command line tool to run batch jobs concurrently with ETL framework on AWS or other cloud computing resources
Stars: ✭ 29 (-27.5%)
Mutual labels:  bioinformatics
adjclust
Adjacency-constrained hierarchical clustering of a similarity matrix
Stars: ✭ 15 (-62.5%)
Mutual labels:  hierarchical-clustering

Neural Distance Embeddings for Biological Sequences

Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch published at NeurIPS 2021 (preprint). NeuroSEED is a novel framework to embed biological sequences in geometric vector spaces.

diagram

Overview

The repository is organised in four main folders one for each of the tasks analysed. Each of these contain scripts and models used for the task as well as instructions on how to run them and the tuned hyperparameters found.

  • edit_distance for the edit distance approximation task
  • closest_string for the closest string retrieval task
  • hierarchical_clustering for the hierarchical clustering task, further divided in relaxed and unsupervised for the two approaches explored
  • multiple_alignment for the multiple sequence alignment task, further divided in guide_tree and steiner_string
  • util contains a series of utility routines shared between all the tasks
  • tests contains a wide range of tests for the various components of the repository

Installation

Create a virtual (or conda) environment and install the dependencies:

python3 -m venv neuroseed
source neuroseed/bin/activate
pip install -r requirements.txt

Then install the mst and unionfind packages used for the hierarchical clustering:

cd hierarchical_clustering/relaxed/mst; python setup.py build_ext --inplace; cd ../../..
cd hierarchical_clustering/relaxed/unionfind; python setup.py build_ext --inplace; cd ../../..

Reference

@article{corso2021neuroseed,
  title={Neural Distance Embeddings for Biological Sequences},
  author={Corso, Gabriele and Ying, Rex and P{\'a}ndy, Michal and Veli{\v{c}}kovi{\'c}, Petar and Leskovec, Jure and Li{\`o}, Pietro},
  journal={Advances in Neural Information Processing Systems},
  year={2021}
}

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].