jowoojun / biovec

Licence: other

ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to biovec

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (+95.65%)

Mutual labels: word2vec, gensim, tsne

Ml Projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (+452.17%)

Mutual labels: svm, word2vec, gensim

Nlp Journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+5508.7%)

Mutual labels: svm, word2vec, gensim

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+752.17%)

Mutual labels: word2vec, gensim

Log Anomaly Detector

Log Anomaly Detection - Machine learning to detect abnormal events logs

Stars: ✭ 169 (+634.78%)

Mutual labels: word2vec, gensim

Splitter

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Stars: ✭ 177 (+669.57%)

Mutual labels: word2vec, gensim

Turkish Word2vec

Pre-trained Word2Vec Model for Turkish

Stars: ✭ 136 (+491.3%)

Mutual labels: word2vec, gensim

SentimentAnalysis

(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset

Stars: ✭ 40 (+73.91%)

Mutual labels: svm, word2vec

Gemsec

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (+813.04%)

Mutual labels: word2vec, gensim

Amazon-Fine-Food-Review

Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review

Stars: ✭ 28 (+21.74%)

Mutual labels: svm, tsne

Textclf

TextClf ：基于Pytorch/Sklearn的文本分类框架，包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型，通过简单配置即可完成数据处理、模型训练、测试等过程。

Stars: ✭ 105 (+356.52%)

Mutual labels: svm, word2vec

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-4.35%)

Mutual labels: word2vec, gensim

Gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+55391.3%)

Mutual labels: word2vec, gensim

Webvectors

Web-ify your word2vec: framework to serve distributional semantic models online

Stars: ✭ 154 (+569.57%)

Mutual labels: word2vec, gensim

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (+721.74%)

Mutual labels: word2vec, gensim

Wordembeddings Elmo Fasttext Word2vec

Using pre trained word embeddings (Fasttext, Word2Vec)

Stars: ✭ 146 (+534.78%)

Mutual labels: word2vec, gensim

Aravec

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Stars: ✭ 239 (+939.13%)

Mutual labels: word2vec, gensim

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+5960.87%)

Mutual labels: word2vec, gensim

Role2vec

A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).

Stars: ✭ 134 (+482.61%)

Mutual labels: word2vec, gensim

doc2vec-api

document embedding and machine learning script for beginners

Stars: ✭ 92 (+300%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

2017Bio2Vec

Protein classification over sum of protein ngrams vector representation

Ordinarily, biological information is represented by an array of characters, but it is suggested that by expressing it as a vector, information can be stored more easily for analysis. As a specific application range,

family classification
protein visualization
structure prediction
disordered protein identification
protein-protein interaction prediction.

Such Classification and prediction are easy to understand usage, but personally I felt that protein visualization would be most useful. Unless the sequence is short or the structure is already known, it seems that the current method of grasping the whole of protein is not popular in general, so I think that such expression method has certain usefulness. Although this idea seems strange at first glance, it is recognized to some extent in natural language.

See another implementation in https://github.com/kyu999/biovec, https://github.com/peter-volkov/biovec

Paper : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287

If you don't have Database, you can download from the below link.

Uniprot (Swiss-prot)

http://www.uniprot.org/downloads

Disprot

http://www.disprot.org/browse

If you don't working on mac OS try this

tensorflow/tensorflow#5089

How to install and use

Install python packages.

pip install -r requirements.txt

cf) If you use macos and get a problem about installation issue with matplotlib python, go to the next link. https://stackoverflow.com/questions/21784641/installation-issue-with-matplotlib-python

Download data file.

If you want to run original program, you have to download original database from below link. https://drive.google.com/file/d/1qTbNRV2oDi4mBQ6RavuJXSSMtxr57ztT/view?usp=sharing

Move the downloaded file to our project directory.
And then, unzip downloaded file.

If you download small DB

tar -xzvf small_DB.tar.gz

If you download original DB

tar -xzvf original_DB.tar.gz

Run make_data_uniprot.py
Now you get ngram's corpus and ngram's vectors, protein's vectors, protein's families to uniprot_sprot.fasta
If you want to get how to we classify proteins into each family, please run bio_svm/train_svm_biovec.py

then you want to know how to we organize SVM using RBF kernels, try next commend.

tensorboard --logdir=./logs

description

word2vec : Generating word2vec model from protein databases(gensim).
document : Protain databases(uniprot, Pfam, disprot, PDB...).
bio_tsne : TSNE(100D to 2D) 3gram vectors and protein vectors.
trained_models : Trained data made by make_data_uniprot.py
bio_svm : Classifying proteins (random PDB and FG-nups).
processd_data : Processing data( json file to fatsta , select data , merge data)
biovisual : Visualization protein vectors
ngrams_properties : For the labeing 3gram aminoacid

How can see graph

1 3gram protein space

Install python packages.

pip install -r requirements.txt

download document
run make_data_uniprot.py

python make_data_uniprot.py

run visualize.py

python visualize.py

choose PS(protein space)

just type PS

finally you can see 3gram protein space

2 binay svm with FG-nups and random PDBs

Install python packages.

pip install -r requirements.txt

download document

unzip document dis-disprot.json , disprot.json ,dis-fg-nups.fasta , fg-nups.fasta , pdb_seqres.fasta , disordered-pdb.fasta move document to processed_data

run processed_sequence.py in processed_data

processed_seqence.py generate dir of binary_svm

have to gzip dataset.fasta file and move binary_svm to document

dataset.fasta located 2017Bio2Vec/processed_data/binary_svm

run make_data_uniprot.py
run visualize.py
choos BSVM (binary svm)
run binary_svm.py
finally you can see binary svm graph

3 density map

Install python packages.

pip install -r requirements.txt

download document

unzip document dis-disprot.json , disprot.json ,dis-fg-nups.fasta , fg-nups.fasta , pdb_seqres.fasta , disordered-pdb.fasta move document to processed_data

run processed_sequence.py in processed_data

processed_seqence.py generate dir of binary_svm

have to gzip all the data

the data located 2017Bio2Vec/processed_data/binary_svm

run make_data_uniprot.py
run visualize.py
choos DM(density map)
finally you can see density map

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jowoojun / biovec

Programming Languages

Labels

Projects that are alternatives of or similar to biovec

2017Bio2Vec

If you don't have Database, you can download from the below link.

If you don't working on mac OS try this

How to install and use

description

How can see graph

1 3gram protein space

2 binay svm with FG-nups and random PDBs

3 density map