lintseju / word_embedding

Licence: MIT license

Sample code for training Word2Vec and FastText using wiki corpus and their pretrained word embedding..

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to word embedding

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+6538.1%)

Mutual labels: word2vec, word-embeddings, fasttext

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+833.33%)

Mutual labels: word2vec, word-embeddings, fasttext

Fasttext.js

FastText for Node.js

Stars: ✭ 127 (+504.76%)

Mutual labels: word2vec, word-embeddings, fasttext

Simple-Sentence-Similarity

Exploring the simple sentence similarity measurements using word embeddings

Stars: ✭ 99 (+371.43%)

Mutual labels: word2vec, word-embeddings, fasttext

Gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+60676.19%)

Mutual labels: word2vec, word-embeddings, fasttext

Koan

A word2vec negative sampling implementation with correct CBOW update.

Stars: ✭ 232 (+1004.76%)

Mutual labels: word2vec, word-embeddings

codenames

Codenames AI using Word Vectors

Stars: ✭ 41 (+95.24%)

Mutual labels: word2vec, word-embeddings

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (+142.86%)

Mutual labels: word2vec, word-embeddings

word-benchmarks

Benchmarks for intrinsic word embeddings evaluation.

Stars: ✭ 45 (+114.29%)

Mutual labels: word2vec, word-embeddings

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (+800%)

Mutual labels: word2vec, word-embeddings

two-stream-cnn

A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data

Stars: ✭ 24 (+14.29%)

Mutual labels: word2vec, word-embeddings

Arabic-Word-Embeddings-Word2vec

Arabic Word Embeddings Word2vec

Stars: ✭ 26 (+23.81%)

Mutual labels: word2vec, word-embeddings

Cw2vec

cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information

Stars: ✭ 224 (+966.67%)

Mutual labels: word2vec, fasttext

Chameleon recsys

Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems

Stars: ✭ 202 (+861.9%)

Mutual labels: word2vec, word-embeddings

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (+180.95%)

Mutual labels: word2vec, word-embeddings

word2vec-on-wikipedia

A pipeline for training word embeddings using word2vec on wikipedia corpus.

Stars: ✭ 68 (+223.81%)

Mutual labels: word2vec, word-embeddings

compress-fasttext

Tools for shrinking fastText models (in gensim format)

Stars: ✭ 124 (+490.48%)

Mutual labels: word-embeddings, fasttext

Embedding

Embedding模型代码和学习笔记总结

Stars: ✭ 25 (+19.05%)

Mutual labels: word2vec, fasttext

wikidata-corpus

Train Wikidata with word2vec for word embedding tasks

Stars: ✭ 109 (+419.05%)

Mutual labels: word2vec, word-embeddings

Wordvectors

Pre-trained word vectors of 30+ languages

Stars: ✭ 2,043 (+9628.57%)

Mutual labels: word2vec, fasttext

View All Similar Projects ➔

Word Embedding

For technical details, please read my blog: Chinese version English version

Environment setup:

virtualenv __ -p python3
source __/bin/activate
pip install -r requirement.txt

Train word embedding on latest wikidump:

python train.py --lang en --model word2vec --size 300 --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--model: word2vec or fasttext
--size: number of dimension of trained word embedding
--output: path to save trained word embedding

Visualization of trained embedding (for English and Chinese only):

python demo.py --lang en --output data/en_wiki_word2vec_300.txt
--lang: en for English, zh for Chinese
--output: path for trained word embedding

Pretrained word embedding:

	Chinese	English
Word2Vec	Download	Download
FastText	Download	Download

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

lintseju / word_embedding

Programming Languages

Labels

Projects that are alternatives of or similar to word embedding

Word Embedding