All Projects → sismetanin → word2vec-tsne

sismetanin / word2vec-tsne

Licence: other
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to word2vec-tsne

sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (-13.56%)
Mutual labels:  word2vec, word-embeddings, embeddings, machinelearning, computational-linguistics, nlp-machine-learning
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-66.1%)
Mutual labels:  word-embeddings, embeddings, computational-linguistics, nlp-machine-learning
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-45.76%)
Mutual labels:  word-embeddings, embeddings, computational-linguistics, nlp-machine-learning
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-67.8%)
Mutual labels:  word2vec, word-embeddings, nlp-machine-learning
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+211.86%)
Mutual labels:  word-embeddings, embeddings, nlp-machine-learning
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (-54.24%)
Mutual labels:  word2vec, word-embeddings, embeddings
Dict2vec
Dict2vec is a framework to learn word embeddings using lexical dictionaries.
Stars: ✭ 91 (+54.24%)
Mutual labels:  word2vec, word-embeddings, embeddings
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+2262.71%)
Mutual labels:  word2vec, word-embeddings, embeddings
Dna2vec
dna2vec: Consistent vector representations of variable-length k-mers
Stars: ✭ 117 (+98.31%)
Mutual labels:  word2vec, word-embeddings, embeddings
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (+115.25%)
Mutual labels:  word2vec, word-embeddings, machinelearning
biovec
ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.
Stars: ✭ 23 (-61.02%)
Mutual labels:  word2vec, tsne
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (-23.73%)
Mutual labels:  word2vec, tsne
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-62.71%)
Mutual labels:  word2vec, embeddings
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+67.8%)
Mutual labels:  word2vec, word-embeddings
empythy
Automated NLP sentiment predictions- batteries included, or use your own data
Stars: ✭ 17 (-71.19%)
Mutual labels:  machinelearning, nlp-machine-learning
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (-59.32%)
Mutual labels:  word2vec, word-embeddings
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (-32.2%)
Mutual labels:  word2vec, embeddings
DeepLearningReading
Deep Learning and Machine Learning mini-projects. Current Project: Deepmind Attentive Reader (rc-data)
Stars: ✭ 78 (+32.2%)
Mutual labels:  embeddings, nlp-machine-learning
embedding evaluation
Evaluate your word embeddings
Stars: ✭ 32 (-45.76%)
Mutual labels:  embeddings, computational-linguistics
Koan
A word2vec negative sampling implementation with correct CBOW update.
Stars: ✭ 232 (+293.22%)
Mutual labels:  word2vec, word-embeddings

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE

This repository contains the source code for visualizing high-dimensional Word2Vec word embeddings using t-SNE. The visualization can be useful to understand how Word2Vec works and how to interpret relations between vectors captured from your texts before using them in neural networks or other machine learning algorithms. As a training data, we will use articles from Google News and classical literary works by Leo Tolstoy, the Russian writer who is regarded as one of the greatest authors of all time.

MSA

Data

The pre-trained model trained on part of Google News dataset (about 100 billion words) is available at https://code.google.com/archive/p/word2vec/ (and also described in [1]). The model contains 300-dimensional vectors for 3 million words and phrases.

Tolstoy's novels in Russian are available at https://www.litres.ru/lev-tolstoy.

References

  1. L. Maate and G. Hinton, "Visualizing data using t-SNE", Journal of Machine Learning Research, vol. 9, pp. 2579-2605, 2008.
  2. T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality", Advances in Neural Information Processing Systems, pp. 3111-3119, 2013.
  3. R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora", Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 2010.

Documentation and How to report bugs

License

See LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].