All Projects → lokicui → doc2vec-golang

lokicui / doc2vec-golang

Licence: Apache-2.0 license
doc2vec , word2vec, implemented by golang. word embedding representation

Programming Languages

go
31211 projects - #10 most used programming language
c
50402 projects - #5 most used programming language
Thrift
134 projects
shell
77523 projects

Projects that are alternatives of or similar to doc2vec-golang

Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-9.09%)
Mutual labels:  word2vec, doc2vec
Word2vec
Python interface to Google word2vec
Stars: ✭ 2,370 (+7081.82%)
Mutual labels:  word2vec, doc2vec
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (-24.24%)
Mutual labels:  word2vec, doc2vec
GE-FSG
Graph Embedding via Frequent Subgraphs
Stars: ✭ 39 (+18.18%)
Mutual labels:  word2vec, doc2vec
altair
Assessing Source Code Semantic Similarity with Unsupervised Learning
Stars: ✭ 42 (+27.27%)
Mutual labels:  word2vec, doc2vec
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+178.79%)
Mutual labels:  word2vec, doc2vec
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+106.06%)
Mutual labels:  word2vec
sarcasm-detection-for-sentiment-analysis
Sarcasm Detection for Sentiment Analysis
Stars: ✭ 21 (-36.36%)
Mutual labels:  word2vec
img classification deep learning
No description or website provided.
Stars: ✭ 19 (-42.42%)
Mutual labels:  word2vec
Emotion-recognition-from-tweets
A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Stars: ✭ 17 (-48.48%)
Mutual labels:  word2vec
Word2VecJava
Word2Vec In Java (2013 google word2vec opensource)
Stars: ✭ 13 (-60.61%)
Mutual labels:  word2vec
wmd4j
wmd4j is a Java library for calculating Word Mover's Distance (WMD)
Stars: ✭ 31 (-6.06%)
Mutual labels:  word2vec
AnnA Anki neuronal Appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
Stars: ✭ 39 (+18.18%)
Mutual labels:  doc2vec
textaugment
TextAugment: Text Augmentation Library
Stars: ✭ 280 (+748.48%)
Mutual labels:  word2vec
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+145.45%)
Mutual labels:  word2vec
word2vec-pytorch
Extremely simple and fast word2vec implementation with Negative Sampling + Sub-sampling
Stars: ✭ 145 (+339.39%)
Mutual labels:  word2vec
NLP PEMDC
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
Stars: ✭ 58 (+75.76%)
Mutual labels:  word2vec
Name-disambiguation
同名论文消歧的工程化方案(参考2019智源-aminer人名消歧竞赛第一名方案)
Stars: ✭ 17 (-48.48%)
Mutual labels:  word2vec
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-30.3%)
Mutual labels:  word2vec
test word2vec uyghur
Bu Uyghur yéziqini Pythonning gensim ambiridiki word2vec algorizimida sinap baqqan misal.
Stars: ✭ 15 (-54.55%)
Mutual labels:  word2vec

doc2vec-golang

golang implement of Tomas Mikolov's word/document embedding. You may want to feel the basic idea from Mikolov's two orignal papers, word2vec and doc2vec. More recently, Andrew M. Dai etc from Google reported its power in more detail

usage

[@bjsjs_11_83 doc2vec-golang]$ ./control build
traning Exec build ok
build ok

# The training data(data/zhihu_data.1w) is one document per line, two columns divided by tab, 
# the first column is id, and the second column is the segmented document separated by spaces.
[@bjsjs_11_83 doc2vec-golang]$ ./train  data/zhihu_data.1w          
Skip-Gram Iter:48 Alpha: 0.000796  Progress: 96.81%  Words/sec: 24.27k  
2018-03-30 14:53:00.218536235 +0800 CST training end, 1342521 26861

[@bjsjs_11_83 doc2vec-golang]$ ./knn 2.model 

please select operation type:
        0:word2words
        1:doc_likelihood
        2:leave one out key words
        3:sen2words
        4:sen2docs
        5:word2docs
        6:doc2docs
        7:doc2words
0
Enter text:网页
        1       网页
        0.7823723719117796      不让
        0.7651260773728028      浏览
        0.7642516944020028      邮件
        0.7601415883811553      
        0.7517607921006224      迷恋
        0.7492900066365179      等同
        0.7485966355448261      传说
        0.7463299535930537      基于
        0.7447865182221745      

please select operation type:
        0:word2words
        1:doc_likelihood
        2:leave one out key words
        3:sen2words
        4:sen2docs
        5:word2docs
        6:doc2docs
        7:doc2words

Dependencies

已实现特性

  • doc2vec支持CBOW和Skip-Gram两种模型,Negative Sampling和Hierarchical Softmax优化均已实现
  • online infer document
  • likelihood of document
  • doc2words
  • doc2docs
  • word2words
  • word2docs

未实现特性

参考资料

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].