All Projects → nanxstats → Protr

nanxstats / Protr

Licence: other
Comprehensive toolkit for generating various numerical features of protein sequences

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Protr

autoencoders tensorflow
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (+120%)
Mutual labels:  feature-extraction, feature-engineering
Rcpi
Molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery.
Stars: ✭ 22 (-26.67%)
Mutual labels:  bioinformatics, feature-extraction
fastknn
Fast k-Nearest Neighbors Classifier for Large Datasets
Stars: ✭ 64 (+113.33%)
Mutual labels:  feature-extraction, feature-engineering
tsflex
Flexible time series feature extraction & processing
Stars: ✭ 252 (+740%)
Mutual labels:  feature-extraction, feature-engineering
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (+1046.67%)
Mutual labels:  feature-extraction, feature-engineering
50-days-of-Statistics-for-Data-Science
This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-36.67%)
Mutual labels:  feature-extraction, feature-engineering
gan tensorflow
Automatic feature engineering using Generative Adversarial Networks using TensorFlow.
Stars: ✭ 48 (+60%)
Mutual labels:  feature-extraction, feature-engineering
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+423.33%)
Mutual labels:  feature-extraction, feature-engineering
Nlpython
This repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Stars: ✭ 265 (+783.33%)
Mutual labels:  feature-extraction, feature-engineering
feature engine
Feature engineering package with sklearn like functionality
Stars: ✭ 758 (+2426.67%)
Mutual labels:  feature-extraction, feature-engineering
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+700%)
Mutual labels:  feature-extraction, feature-engineering
Feature Engineering And Feature Selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Stars: ✭ 526 (+1653.33%)
Mutual labels:  feature-extraction, feature-engineering
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+626.67%)
Mutual labels:  feature-extraction, feature-engineering
Bike-Sharing-Demand-Kaggle
Top 5th percentile solution to the Kaggle knowledge problem - Bike Sharing Demand
Stars: ✭ 33 (+10%)
Mutual labels:  feature-extraction, feature-engineering
Tsfel
An intuitive library to extract features from time series
Stars: ✭ 202 (+573.33%)
Mutual labels:  feature-extraction, feature-engineering
featurewiz
Use advanced feature engineering strategies and select best features from your data set with a single line of code.
Stars: ✭ 229 (+663.33%)
Mutual labels:  feature-extraction, feature-engineering
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+35560%)
Mutual labels:  feature-extraction, feature-engineering
The Building Data Genome Project
A collection of non-residential buildings for performance analysis and algorithm benchmarking
Stars: ✭ 117 (+290%)
Mutual labels:  feature-extraction, feature-engineering
mistql
A miniature lisp-like language for querying JSON-like structures. Tuned for clientside ML feature extraction.
Stars: ✭ 260 (+766.67%)
Mutual labels:  feature-extraction, feature-engineering
Awesome Feature Engineering
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Stars: ✭ 433 (+1343.33%)
Mutual labels:  feature-extraction, feature-engineering

protr

R-CMD-check CRAN Version Downloads from the RStudio CRAN mirror

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042> (PDF).

Paper Citation

Formatted citation:

Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.

BibTeX entry:

@article{Xiao2015,
  author = {Xiao, Nan and Cao, Dong-Sheng and Zhu, Min-Feng and Xu, Qing-Song.},
  title = {{protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences}},
  journal = {Bioinformatics},
  year = {2015},
  volume = {31},
  number = {11},
  pages = {1857--1859},
  doi = {10.1093/bioinformatics/btv042},
  issn = {1367-4803},
  url = {http://bioinformatics.oxfordjournals.org/content/31/11/1857}
}

Installation

To install protr from CRAN:

install.packages("protr")

Or try the latest version on GitHub:

# install.packages("devtools")
devtools::install_github("nanxstats/protr")

Browse the package vignette for a quick-start.

Shiny App

ProtrWeb, the Shiny web application built on protr, can be accessed from http://protr.org.

ProtrWeb is a user-friendly web application for computing the protein sequence descriptors (features) presented in the protr package.

List of Supported Descriptors

Commonly used descriptors

  • Amino acid composition descriptors

    • Amino acid composition
    • Dipeptide composition
    • Tripeptide composition
  • Autocorrelation descriptors

    • Normalized Moreau-Broto autocorrelation
    • Moran autocorrelation
    • Geary autocorrelation
  • CTD descriptors

    • Composition
    • Transition
    • Distribution
  • Conjoint Triad descriptors

  • Quasi-sequence-order descriptors

    • Sequence-order-coupling number
    • Quasi-sequence-order descriptors
  • Pseudo amino acid composition (PseAAC)

    • Pseudo amino acid composition
    • Amphiphilic pseudo amino acid composition
  • Profile-based descriptors

    • Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix)

Proteochemometric (PCM) modeling descriptors

  • Scales-based descriptors derived by principal components analysis
    • Scales-based descriptors derived by amino acid properties (AAindex)
    • Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)
    • Scales-based descriptors derived by factor analysis
    • Scales-based descriptors derived by multidimensional scaling
    • BLOSUM and PAM matrix-derived descriptors

Similarity computation

Local and global pairwise sequence alignment for protein sequences:

  • Between two protein sequences
  • Parallelized pairwise similarity calculation with a list of protein sequences

GO semantic similarity measures:

  • Between two groups of GO terms / two Entrez Gene IDs
  • Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs

Miscellaneous tools and datasets

  • Retrieve protein sequences from UniProt
  • Read protein sequences in FASTA format
  • Read protein sequences in PDB format
  • Sanity check of the amino acid types appeared in the protein sequences
  • Protein sequence segmentation
  • Auto cross covariance (ACC) for generating scales-based descriptors of the same length
  • 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors
  • BLOSUM and PAM matrices for the 20 amino acids
  • Meta information of the 20 amino acids

Contribute

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].