All Projects → crtomirmajer → wmd4j

crtomirmajer / wmd4j

Licence: MIT license
wmd4j is a Java library for calculating Word Mover's Distance (WMD)

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to wmd4j

Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+219.35%)
Mutual labels:  word2vec, wmd
revery
A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.
Stars: ✭ 200 (+545.16%)
Mutual labels:  word2vec
stackoverflow-semantic-search
Word2Vec encodings based search engine for Stackoverflow questions
Stars: ✭ 23 (-25.81%)
Mutual labels:  word2vec
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+119.35%)
Mutual labels:  word2vec
receiptdID
Receipt.ID is a multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network.
Stars: ✭ 22 (-29.03%)
Mutual labels:  word2vec
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (+45.16%)
Mutual labels:  word2vec
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+196.77%)
Mutual labels:  word2vec
word2vec
Rust interface to word2vec.
Stars: ✭ 22 (-29.03%)
Mutual labels:  word2vec
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-25.81%)
Mutual labels:  word2vec
word2vec-pytorch
Extremely simple and fast word2vec implementation with Negative Sampling + Sub-sampling
Stars: ✭ 145 (+367.74%)
Mutual labels:  word2vec
img classification deep learning
No description or website provided.
Stars: ✭ 19 (-38.71%)
Mutual labels:  word2vec
wordmap
Visualize large text collections with WebGL
Stars: ✭ 23 (-25.81%)
Mutual labels:  word2vec
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (-6.45%)
Mutual labels:  word2vec
acl2017 document clustering
code for "Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering" ACL 2017
Stars: ✭ 21 (-32.26%)
Mutual labels:  word2vec
sarcasm-detection-for-sentiment-analysis
Sarcasm Detection for Sentiment Analysis
Stars: ✭ 21 (-32.26%)
Mutual labels:  word2vec
Word-Embeddings-and-Document-Vectors
An evaluation of word-embeddings for classification
Stars: ✭ 32 (+3.23%)
Mutual labels:  word2vec
Name-disambiguation
同名论文消歧的工程化方案(参考2019智源-aminer人名消歧竞赛第一名方案)
Stars: ✭ 17 (-45.16%)
Mutual labels:  word2vec
textaugment
TextAugment: Text Augmentation Library
Stars: ✭ 280 (+803.23%)
Mutual labels:  word2vec
test word2vec uyghur
Bu Uyghur yéziqini Pythonning gensim ambiridiki word2vec algorizimida sinap baqqan misal.
Stars: ✭ 15 (-51.61%)
Mutual labels:  word2vec
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+161.29%)
Mutual labels:  word2vec

wmd4j

wmd4j is a Java library for computing Word Mover's Distance (WMD) between 2 text documents. It provides same functionality as Word2Vec.wmdistance in Gensim.

wmd4j depends on deeplearning4j WordVectors interface for word vectors manipulation and uses optimized version of JFastEMD (Earth Mover's Distance transportaion problem) underneath, which is about 1.8x faster.

Usage

WordVectors vectors = WordVectorSerializer.loadGoogleModel(new File(word2vecPath), false);
WordMovers wm = WordMovers.Builder().wordVectors(vectors).build();

wm.distance("obama speaks to the media in illinois", "the president greets the press in chicago");

Validation

wmd4j is validated against Gensim's wmdistance results on custom word2vec model.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].