Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ynqa → Wego

ynqa / Wego

Licence: apache-2.0

Word Embeddings (e.g. Word2Vec) in Go!

Programming Languages

31211 projects - #10 most used programming language

Labels

machine-learning nlp word2vec word-embeddings glove

Projects that are alternatives of or similar to Wego

Glove As A Tensorflow Embedding Layer

Taking a pretrained GloVe model, and using it as a TensorFlow embedding weight layer **inside the GPU**. Therefore, you only need to send the index of the words through the GPU data transfer bus, reducing data transfer overhead.

Stars: ✭ 85 (-74.7%)

Mutual labels: word2vec, word-embeddings, glove

Text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

Stars: ✭ 715 (+112.8%)

Mutual labels: word2vec, word-embeddings, glove

Simple-Sentence-Similarity

Exploring the simple sentence similarity measurements using word embeddings

Stars: ✭ 99 (-70.54%)

Mutual labels: word2vec, word-embeddings, glove

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+314.88%)

Mutual labels: word2vec, word-embeddings, glove

Arabic-Word-Embeddings-Word2vec

Arabic Word Embeddings Word2vec

Stars: ✭ 26 (-92.26%)

Mutual labels: word2vec, word-embeddings

word-benchmarks

Benchmarks for intrinsic word embeddings evaluation.

Stars: ✭ 45 (-86.61%)

Mutual labels: word2vec, word-embeddings

sarcasm-detection-for-sentiment-analysis

Sarcasm Detection for Sentiment Analysis

Stars: ✭ 21 (-93.75%)

Mutual labels: word2vec, glove

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (-82.44%)

Mutual labels: word2vec, word-embeddings

Koan

A word2vec negative sampling implementation with correct CBOW update.

Stars: ✭ 232 (-30.95%)

Mutual labels: word2vec, word-embeddings

navec

Compact high quality word embeddings for Russian language

Stars: ✭ 118 (-64.88%)

Mutual labels: word2vec, glove

lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (-91.96%)

Mutual labels: word2vec, word-embeddings

word2vec-on-wikipedia

A pipeline for training word embeddings using word2vec on wikipedia corpus.

Stars: ✭ 68 (-79.76%)

Mutual labels: word2vec, word-embeddings

two-stream-cnn

A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data

Stars: ✭ 24 (-92.86%)

Mutual labels: word2vec, word-embeddings

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-93.15%)

Mutual labels: word2vec, glove

NTUA-slp-nlp

💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA

Stars: ✭ 19 (-94.35%)

Mutual labels: word2vec, word-embeddings

datastories-semeval2017-task6

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Stars: ✭ 20 (-94.05%)

Mutual labels: word-embeddings, glove

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (-84.82%)

Mutual labels: word2vec, word-embeddings

codenames

Codenames AI using Word Vectors

Stars: ✭ 41 (-87.8%)

Mutual labels: word2vec, word-embeddings

SWDM

SIGIR 2017: Embedding-based query expansion for weighted sequential dependence retrieval model

Stars: ✭ 35 (-89.58%)

Mutual labels: word2vec, word-embeddings

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (-41.67%)

Mutual labels: word2vec, word-embeddings

View All Similar Projects ➔

Word Embeddings in Go

wego is the implementations from scratch for word embeddings (a.k.a word representation) models in Go.

What's word embeddings?

Word embeddings make words' meaning, structure, and concept mapping into vector space with a low dimension. For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, the models generate word vectors that could calculate word meaning by arithmetic operations for other vectors.

Features

The following models to capture the word vectors are supported in wego:

Word2Vec: Distributed Representations of Words and Phrases and their Compositionality [pdf]
GloVe: Global Vectors for Word Representation [pdf]
LexVec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations [pdf]

Also, wego provides nearest neighbor search tools that calculate the distances between word vectors and find the nearest words for the target word. "near" for word vectors means "similar" for words.

Please see the Usage section if you want to know how to use these for more details.

Why Go?

Inspired by Data Science in Go @chewxy

Installation

Use go command to get this pkg.

$ go get -u github.com/ynqa/wego
$ bin/wego -h

Usage

wego provides CLI and Go SDK for word embeddings.

CLI

Usage:
  wego [flags]
  wego [command]

Available Commands:
  console     Console to investigate word vectors
  glove       GloVe: Global Vectors for Word Representation
  help        Help about any command
  lexvec      Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
  query       Query similar words
  word2vec    Word2Vec: Continuous Bag-of-Words and Skip-gram model

word2vec, glove and lexvec executes the workflow to generate word vectors:

Build a dictionary for vocabularies and count word frequencies by scanning a given corpus.
Start training. The execution time depends on the size of the corpus, the hyperparameters (flags), and so on.
Save the words and their vectors as a text file.

query and console are the commands which are related to nearest neighbor searching for the trained word vectors.

query outputs similar words against a given word using sing word vectors which are generated by the above models.

e.g. wego query -i word_vector.txt microsoft:

  RANK |   WORD    | SIMILARITY
-------+-----------+-------------
     1 | hypercard |   0.791492
     2 | xp        |   0.768939
     3 | software  |   0.763369
     4 | freebsd   |   0.761084
     5 | unix      |   0.749563
     6 | linux     |   0.747327
     7 | ibm       |   0.742115
     8 | windows   |   0.731136
     9 | desktop   |   0.715790
    10 | linspire  |   0.711171

wego does not reproduce word vectors between each trial because it adopts HogWild! algorithm which updates the parameters (in this case word vector) async.

console is for REPL mode to calculate the basic arithmetic operations (+ and -) for word vectors.

Go SDK

It can define the hyper parameters for models by functional options.

model, err := word2vec.New(
	word2vec.Window(5),
	word2vec.Model(word2vec.Cbow),
	word2vec.Optimizer(word2vec.NegativeSampling),
	word2vec.NegativeSampleSize(5),
	word2vec.Verbose(),
)

The models have some methods:

type Model interface {
	Train(io.ReadSeeker) error
	Save(io.Writer, vector.Type) error
	WordVector(vector.Type) *matrix.Matrix
}

Formats

As training word vectors wego requires the following file formats for inputs/outputs.

Input

Input corpus must be subject to the formats to be divided by space between words like text8.

word1 word2 word3 ...

Output

After training wego save the word vectors into a txt file with the following format (N is the dimension for word vectors you given):

<word> <value_1> <value_2> ... <value_N>

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 336

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗