All Projects → lkytal → PredFull

lkytal / PredFull

Licence: other
This work was published on Analytical Chemistry: Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PredFull

pmartR
The pmartR R package provides functionality for quality control, normalization, exploratory data analysis, and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data. This includes data transformation, specification of groups that are to be …
Stars: ✭ 19 (-5%)
Mutual labels:  mass-spectrometry, peptides
matchering-cli
🎚️ Simple Matchering 2.0 Command Line Application
Stars: ✭ 28 (+40%)
Mutual labels:  spectrum
gis-snippets
Some code snippets for GIS tasks
Stars: ✭ 45 (+125%)
Mutual labels:  spectrum
lighthouse-of-doom
A simple text-based adventure game
Stars: ✭ 52 (+160%)
Mutual labels:  spectrum
Spectrum
A Discord bot with tons of features, written in Python, made for fun.
Stars: ✭ 27 (+35%)
Mutual labels:  spectrum
NR1-UI
Userinterface for Volumio (RaspberryPi) with ssd1322 and ssd1306 oled display, spectrum bargraph, progress bar, LED functions, Standby-functions, 4 Buttons and Rotary Encoder.
Stars: ✭ 29 (+45%)
Mutual labels:  spectrum
matchering-web
🎚️ Self-Hosted LANDR / eMastered Alternative
Stars: ✭ 25 (+25%)
Mutual labels:  spectrum
mzQC
Reporting and exchange format for mass spectrometry quality control data
Stars: ✭ 21 (+5%)
Mutual labels:  mass-spectrometry
aubio-go
Go wrapper for audio and music analysis library Aubio. WORK IN PROGRESS
Stars: ✭ 21 (+5%)
Mutual labels:  spectrum
ibm-spectrum-scale-csi
The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Stars: ✭ 41 (+105%)
Mutual labels:  spectrum
ZXDB
Open database with historical information about Sinclair machines
Stars: ✭ 48 (+140%)
Mutual labels:  spectrum
qt-spek
基于Qt的频谱分析器,修改于spek
Stars: ✭ 34 (+70%)
Mutual labels:  spectrum
SciDataTool
SciDataTool is an open-source Python package for scientific data handling. The objective is to provide a user-friendly, unified, flexible module to postprocess any kind of signal. It is meant to be used by researchers, R&D engineers and teachers in any scientific area. This package allows to efficiently store data fields in the time/space or in …
Stars: ✭ 21 (+5%)
Mutual labels:  spectrum
spec
[OLD!] RGB Protocol specifications for Bitcoin-based digital assets
Stars: ✭ 149 (+645%)
Mutual labels:  spectrum
PothosSoapy
Pothos framework support for software defined radio hardware.
Stars: ✭ 26 (+30%)
Mutual labels:  spectrum
unity-music-visualizer
Basic music visualization project for Unity.
Stars: ✭ 39 (+95%)
Mutual labels:  spectrum
spectrum
📉 Spectrum visualizer
Stars: ✭ 86 (+330%)
Mutual labels:  spectrum
kaleidoscope
🍀 A small collection of creative nodes to generate color palette and store values for Blender
Stars: ✭ 99 (+395%)
Mutual labels:  spectrum
MALDIquant
Quantitative Analysis of Mass Spectrometry Data
Stars: ✭ 48 (+140%)
Mutual labels:  mass-spectrometry
audio-spectrum
Draw spectrum of audio data
Stars: ✭ 16 (-20%)
Mutual labels:  spectrum

PredFull

Visit http://predfull.com/ to try online prediction

This work was published on Analytical Chemistry: Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network

Kaiyuan Liu, Sujun Li, Lei Wang, Yuzhen Ye, Haixu Tang

The first model for predicting complete tandem mass spectra from peptides sequences, using a deep CNN neural network trained on over 2 million experimental spectra.

Free for academic uses.

Update History

  • 2022.05.19: Support input peptide of any length
  • 2021.05.18: Support predicting peptides with oxidized methionine.
  • 2021.01.01: Update example results.
  • 2020.08.22: Fixed performance issues.
  • 2020.05.25: Support predicting non-tryptic peptides.
  • 2019.09.01: First version.

Method

Based on the structure of the residual convolutional networks. Current precision (bin size): 0.1 Th.

model

How to use

Expect clone this project, you should download pm.h5 from google drive and place it into this folder.

Important Notes

  • The only modification (PTM) supported is oxidation on Methionine, otherwise only UNMODIFIED peptides are allowed. To indicate an oxidized methionine, use the format "M(O)".
  • This model assumes a FIXED carbamidomethyl on C
  • The length of input peptides are NOT limited, however, would expect poor performance with peptides longer than 30
  • The prediction will NOT output peaks with M/z > 2000
  • Predicted peaks that are weaker than STRONGEST_PEAK / 1000 are regarded as noises thus will be omitted from the final output.

Required Packages

Recommend to install dependency via Anaconda

  • Python >= 3.7
  • Tensorflow >= 2.3.0
  • Pandas >= 0.20
  • pyteomics
  • lxml

The Tensorflow has to be 2.30 or newer! A compatibility bug in Tensorflow made version before 2.3.0 can't load the model correctly. We'll release a new model once the Tensorflow team solve this.

Input format

The required input format is TSV, with the following columns:

Peptide Charge Type NCE
AAAAAAAAAVSR 2 HCD 25
AAGAAESEEDFLR 2 HCD 25
AAPAPTASSTININTSTSK 2 HCD 25
AAPAPM(O)NTSTSK 2 HCD 25

Apparently, 'Peptide' and 'Charge' columns mean what it says. The 'Type' must be HCD or ETD (in uppercase). NCE means normalized collision energy, set to 25 as default. Note that in the above examples the last peptide has an oxidized methionine, and it's the only modification supported now. Check example.tsv for examples.

Usage

Simply run:

python predfull.py --input example.tsv --model pm.h5 --output example_prediction.mgf

The output file is in MGF format

  • --input: the input file
  • --output: the output path
  • --model: the pretrained model

Prediction Examples

Note that intensities are shown by square rooted values

example 1

example 2

Performance Evaluation

We provide sample data on google drive and codes for you to evaluate the prediction performance. The hcd_testingset.mgf file on google drive contains ground truth spectra (randomly sampled from NIST Human Synthetic Peptide Spectral Library) that corresponding to items in example.tsv, while the example_prediction.mgf file contains pre-run predictions.

To evaluate the similarity, first download groud truth reference file hcd_testingset.mgf from google drive, then run:

python compare_performance.py --real hcd_testingset.mgf --pred example_prediction.mgf

  • --real: the ground truth file
  • --pred: the prediction file

You should get around ~0.789 average similarities using these two pre-given MGF files.

Make sure that items in example.tsv and hcd_testingset.mgf are of the same order! Don't permute items or add/delete items unless you will align them by yourself.

How to build & train the model

For those who are interested in reproducing this model, here we provide train_model.py of example codes to build and train the model.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].