All Projects → taki0112 → Word2VecJava

taki0112 / Word2VecJava

Licence: MIT license
Word2Vec In Java (2013 google word2vec opensource)

Programming Languages

java
68154 projects - #9 most used programming language

Labels

Projects that are alternatives of or similar to Word2VecJava

wordmap
Visualize large text collections with WebGL
Stars: ✭ 23 (+76.92%)
Mutual labels:  word2vec
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (+123.08%)
Mutual labels:  word2vec
word2vec
Rust interface to word2vec.
Stars: ✭ 22 (+69.23%)
Mutual labels:  word2vec
Name-disambiguation
同名论文消歧的工程化方案(参考2019智源-aminer人名消歧竞赛第一名方案)
Stars: ✭ 17 (+30.77%)
Mutual labels:  word2vec
textaugment
TextAugment: Text Augmentation Library
Stars: ✭ 280 (+2053.85%)
Mutual labels:  word2vec
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (+76.92%)
Mutual labels:  word2vec
acl2017 document clustering
code for "Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering" ACL 2017
Stars: ✭ 21 (+61.54%)
Mutual labels:  word2vec
NLP PEMDC
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
Stars: ✭ 58 (+346.15%)
Mutual labels:  word2vec
word-benchmarks
Benchmarks for intrinsic word embeddings evaluation.
Stars: ✭ 45 (+246.15%)
Mutual labels:  word2vec
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+523.08%)
Mutual labels:  word2vec
img classification deep learning
No description or website provided.
Stars: ✭ 19 (+46.15%)
Mutual labels:  word2vec
word2vec-on-wikipedia
A pipeline for training word embeddings using word2vec on wikipedia corpus.
Stars: ✭ 68 (+423.08%)
Mutual labels:  word2vec
revery
A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.
Stars: ✭ 200 (+1438.46%)
Mutual labels:  word2vec
Emotion-recognition-from-tweets
A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.
Stars: ✭ 17 (+30.77%)
Mutual labels:  word2vec
test word2vec uyghur
Bu Uyghur yéziqini Pythonning gensim ambiridiki word2vec algorizimida sinap baqqan misal.
Stars: ✭ 15 (+15.38%)
Mutual labels:  word2vec
receiptdID
Receipt.ID is a multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network.
Stars: ✭ 22 (+69.23%)
Mutual labels:  word2vec
Arabic-Word-Embeddings-Word2vec
Arabic Word Embeddings Word2vec
Stars: ✭ 26 (+100%)
Mutual labels:  word2vec
word2vec-from-scratch-with-python
A very simple, bare-bones, inefficient, implementation of skip-gram word2vec from scratch with Python
Stars: ✭ 85 (+553.85%)
Mutual labels:  word2vec
wmd4j
wmd4j is a Java library for calculating Word Mover's Distance (WMD)
Stars: ✭ 31 (+138.46%)
Mutual labels:  word2vec
sarcasm-detection-for-sentiment-analysis
Sarcasm Detection for Sentiment Analysis
Stars: ✭ 21 (+61.54%)
Mutual labels:  word2vec

Word2Vec In Java

Usage

  • Put "Input.txt" in the folder containing the source code
The contents of Input.txt are as follows
There is one document per line
All documents must be preprocessed
  • Preprocessing: documents should be separated by words using morphemes
  • In Eclipse, you mush give arguments (Run - Run Configurations...)

Arguments

  • a = input.txt, b = output.txt... That is, the name of the input output file.
  • but, in Code I hava set it (Line 34, 35)

Contents of "Input.txt" after preprocessing

  • Document 1 : KimJunho is interested in machine learning and deep learning
  • Document 2 : KimJunho is interested in recruiting professional researches
KimJunho isterested machine learning deep learning
KimJunho recruiting professional researchers

Main Variable Description

  • See Line 894 (public static class Builder)
1. cbow = false
   Which of the cbow and skip-gram models to learn ?
   false : use skip gram
   true : use cbow model

2. startingAlpha = 0.025F
   This is a learningrate
   The smaller the value, the more accurate the learning, but the slower the learning speed

3. window = 5
   How many words to look around when learning
   The default value is 5, meaning that you see 5 words

4. negative = 0
   It can be used to improve the efficiency of calculation speed
   Methodology has Hierarchical Softmax and Negative Sampling
   If 0, Hierarchical Softmax
   else, Negative Sampling.. default value 5~10

5. minCount = 5
   Meaning that I will only see words from at least a few words in the document
   If you want to learn every word, minCount = 0

6. layerOneSize = 200
   Mean dimension of word vector
   default value is 200
   The higher the dimension, the more precise it is, but the learning speed is slower
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].