All Projects → stephantul → reach

stephantul / reach

Licence: MIT License
Load embeddings and featurize your sentences.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to reach

Sensegram
Making sense embedding out of word embeddings using graph-based word sense induction
Stars: ✭ 209 (+1129.41%)
Mutual labels:  word2vec, embeddings
Word2vec Spam Filter
Using word vectors to classify spam messages
Stars: ✭ 149 (+776.47%)
Mutual labels:  numpy, word2vec
Cw2vec
cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information
Stars: ✭ 224 (+1217.65%)
Mutual labels:  word2vec, embeddings
Dna2vec
dna2vec: Consistent vector representations of variable-length k-mers
Stars: ✭ 117 (+588.24%)
Mutual labels:  word2vec, embeddings
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+247.06%)
Mutual labels:  word2vec, embeddings
Embedding As Service
One-Stop Solution to encode sentence to fixed length vectors from various embedding techniques
Stars: ✭ 151 (+788.24%)
Mutual labels:  word2vec, embeddings
From Python To Numpy
An open-access book on numpy vectorization techniques, Nicolas P. Rougier, 2017
Stars: ✭ 1,728 (+10064.71%)
Mutual labels:  numpy, vectorization
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (+276.47%)
Mutual labels:  word2vec, embeddings
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (+135.29%)
Mutual labels:  word2vec, embeddings
navec
Compact high quality word embeddings for Russian language
Stars: ✭ 118 (+594.12%)
Mutual labels:  word2vec, embeddings
Awesome Embedding Models
A curated list of awesome embedding models tutorials, projects and communities.
Stars: ✭ 1,486 (+8641.18%)
Mutual labels:  word2vec, embeddings
sentiment-analysis-of-tweets-in-russian
Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.
Stars: ✭ 51 (+200%)
Mutual labels:  word2vec, embeddings
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+8100%)
Mutual labels:  word2vec, embeddings
Entity2rec
entity2rec generates item recommendation using property-specific knowledge graph embeddings
Stars: ✭ 159 (+835.29%)
Mutual labels:  word2vec, embeddings
Dict2vec
Dict2vec is a framework to learn word embeddings using lexical dictionaries.
Stars: ✭ 91 (+435.29%)
Mutual labels:  word2vec, embeddings
Deep learning nlp
Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP
Stars: ✭ 407 (+2294.12%)
Mutual labels:  numpy, word2vec
Philo2vec
An implementation of word2vec applied to [stanford philosophy encyclopedia](http://plato.stanford.edu/)
Stars: ✭ 33 (+94.12%)
Mutual labels:  word2vec, embeddings
Finalfusion Rust
finalfusion embeddings in Rust
Stars: ✭ 35 (+105.88%)
Mutual labels:  word2vec, embeddings
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (+29.41%)
Mutual labels:  word2vec, embeddings
lda2vec
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019
Stars: ✭ 27 (+58.82%)
Mutual labels:  word2vec, embeddings

reach

A light-weight package for working with pre-trained word embeddings. Useful for input into neural networks, or for doing compositional semantics.

reach can read in word vectors in word2vec or glove format without any preprocessing.

The assumption behind reach is a no-hassle approach to featurization. The vectorization and bow approaches know how to deal with OOV words, removing these problems from your code.

reach also includes nearest neighbor calculation for arbitrary vectors.

Example

import numpy as np

from reach import Reach

# Load from a .vec or .txt file
# unk_word specifies which token is the "unknown" token.
# If this is token is not in your vector space, it is added as an extra word
# and a corresponding zero vector.
# If it is in your embedding space, it is used.
r = Reach.load("path/to/embeddings", unk_word="UNK")

# Alternatively, if you have a matrix, you can directly
# input it.

# Stand-in for word embeddings
mtr = np.random.rand(8, 300)
words = ["UNK", "cat", "dog", "best", "creature", "alive", "span", "prose"]
r = Reach(mtr, words, unk_index=0)

# Get vectors through indexing.
# Throws a KeyError if a word is not present.
vector = r['cat']

# Compare two words.
similarity = r.similarity('cat', 'dog')

# Find most similar.
similarities = r.most_similar('cat', 2)

sentence = 'a dog is the best creature alive'.split()
corpus = [sentence, sentence, sentence]

# bow representation consistent with word vectors,
# for input into neural network.
bow = r.bow(sentence)

# vectorized representation.
vectorized = r.vectorize(sentence)

# can remove OOV words automatically.
vectorized = r.vectorize(sentence, remove_oov=True)

# vectorize corpus.
transformed = r.transform(corpus)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].