All Projects → lintseju → word_embedding

lintseju / word_embedding

Licence: MIT license
Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding..

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to word embedding

Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+6538.1%)
Mutual labels:  word2vec, word-embeddings, fasttext
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+833.33%)
Mutual labels:  word2vec, word-embeddings, fasttext
Fasttext.js
FastText for Node.js
Stars: ✭ 127 (+504.76%)
Mutual labels:  word2vec, word-embeddings, fasttext
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+371.43%)
Mutual labels:  word2vec, word-embeddings, fasttext
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+60676.19%)
Mutual labels:  word2vec, word-embeddings, fasttext
Koan
A word2vec negative sampling implementation with correct CBOW update.
Stars: ✭ 232 (+1004.76%)
Mutual labels:  word2vec, word-embeddings
codenames
Codenames AI using Word Vectors
Stars: ✭ 41 (+95.24%)
Mutual labels:  word2vec, word-embeddings
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+142.86%)
Mutual labels:  word2vec, word-embeddings
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (+114.29%)
Mutual labels:  word2vec, word-embeddings
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+800%)
Mutual labels:  word2vec, word-embeddings
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (+14.29%)
Mutual labels:  word2vec, word-embeddings
Arabic-Word-Embeddings-Word2vec
Arabic Word Embeddings Word2vec
Stars: ✭ 26 (+23.81%)
Mutual labels:  word2vec, word-embeddings
Cw2vec
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Stars: ✭ 224 (+966.67%)
Mutual labels:  word2vec, fasttext
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (+861.9%)
Mutual labels:  word2vec, word-embeddings
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+180.95%)
Mutual labels:  word2vec, word-embeddings
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+223.81%)
Mutual labels:  word2vec, word-embeddings
compress-fasttext
Tools for shrinking fastText models (in gensim format)
Stars: ✭ 124 (+490.48%)
Mutual labels:  word-embeddings, fasttext
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (+19.05%)
Mutual labels:  word2vec, fasttext
wikidata-corpus
Train Wikidata with word2vec for word embedding tasks
Stars: ✭ 109 (+419.05%)
Mutual labels:  word2vec, word-embeddings
Wordvectors
Pre-trained word vectors of 30+ languages
Stars: ✭ 2,043 (+9628.57%)
Mutual labels:  word2vec, fasttext

Word Embedding

For technical details, please read my blog: Chinese version English version

Environment setup:

virtualenv __ -p python3
source __/bin/activate
pip install -r requirement.txt

Train word embedding on latest wikidump:

python train.py --lang en --model word2vec --size 300 --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--model: word2vec or fasttext
--size: number of dimension of trained word embedding
--output: path to save trained word embedding

Visualization of trained embedding (for English and Chinese only):

python demo.py --lang en --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--output: path for trained word embedding

Pretrained word embedding:

Chinese English
Word2Vec Download Download
FastText Download Download
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].