A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.

Stars: ✭ 200 (+1438.46%)

Mutual labels: word2vec

Emotion-recognition-from-tweets

A comprehensive approach on recognizing emotion (sentiment) from a certain tweet. Supervised machine learning.

Stars: ✭ 17 (+30.77%)

Mutual labels: word2vec

test word2vec uyghur

Bu Uyghur yéziqini Pythonning gensim ambiridiki word2vec algorizimida sinap baqqan misal.

Stars: ✭ 15 (+15.38%)

Mutual labels: word2vec

receiptdID

Receipt.ID is a multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network.

Stars: ✭ 22 (+69.23%)

Mutual labels: word2vec

Arabic-Word-Embeddings-Word2vec

Arabic Word Embeddings Word2vec

Stars: ✭ 26 (+100%)

Mutual labels: word2vec

word2vec-from-scratch-with-python

A very simple, bare-bones, inefficient, implementation of skip-gram word2vec from scratch with Python

Stars: ✭ 85 (+553.85%)

Mutual labels: word2vec

wmd4j

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

Stars: ✭ 31 (+138.46%)

Mutual labels: word2vec

sarcasm-detection-for-sentiment-analysis

Sarcasm Detection for Sentiment Analysis

Stars: ✭ 21 (+61.54%)

Mutual labels: word2vec

View All Similar Projects ➔

Word2Vec In Java

https://code.google.com/archive/p/word2vec/source/default/source
Changed Word2vec c code to Java

Usage

Put "Input.txt" in the folder containing the source code

The contents of Input.txt are as follows
There is one document per line
All documents must be preprocessed

Preprocessing: documents should be separated by words using morphemes
In Eclipse, you mush give arguments (Run - Run Configurations...)

a = input.txt, b = output.txt... That is, the name of the input output file.
but, in Code I hava set it (Line 34, 35)

Contents of "Input.txt" after preprocessing

Document 1 : KimJunho is interested in machine learning and deep learning
Document 2 : KimJunho is interested in recruiting professional researches

KimJunho isterested machine learning deep learning
KimJunho recruiting professional researchers

Main Variable Description

See Line 894 (public static class Builder)

1. cbow = false
   Which of the cbow and skip-gram models to learn ?
   false : use skip gram
   true : use cbow model

2. startingAlpha = 0.025F
   This is a learningrate
   The smaller the value, the more accurate the learning, but the slower the learning speed

3. window = 5
   How many words to look around when learning
   The default value is 5, meaning that you see 5 words

4. negative = 0
   It can be used to improve the efficiency of calculation speed
   Methodology has Hierarchical Softmax and Negative Sampling
   If 0, Hierarchical Softmax
   else, Negative Sampling.. default value 5~10

5. minCount = 5
   Meaning that I will only see words from at least a few words in the document
   If you want to learn every word, minCount = 0

6. layerOneSize = 200
   Mean dimension of word vector
   default value is 200
   The higher the dimension, the more precise it is, but the learning speed is slower

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

taki0112 / Word2VecJava

Programming Languages

Labels

Projects that are alternatives of or similar to Word2VecJava

Word2Vec In Java

Usage

Contents of "Input.txt" after preprocessing

Main Variable Description