Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → zake7749 → Word2vec Tutorial

zake7749 / Word2vec Tutorial

Licence: mit

中文詞向量訓練教學

Programming Languages

139335 projects - #7 most used programming language

Labels

word2vec gensim

Projects that are alternatives of or similar to Word2vec Tutorial

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (-50.7%)

Mutual labels: word2vec, gensim

Lmdb Embeddings

Fast word vectors with little memory usage in Python

Stars: ✭ 404 (-5.16%)

Mutual labels: word2vec, gensim

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Stars: ✭ 239 (-43.9%)

Mutual labels: word2vec, gensim

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Stars: ✭ 177 (-58.45%)

Mutual labels: word2vec, gensim

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (-87.79%)

Mutual labels: word2vec, gensim

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (-55.63%)

Mutual labels: word2vec, gensim

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (-89.44%)

Mutual labels: word2vec, gensim

Wordembeddings Elmo Fasttext Word2vec

Using pre trained word embeddings (Fasttext, Word2Vec)

Stars: ✭ 146 (-65.73%)

Mutual labels: word2vec, gensim

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (-77.93%)

Mutual labels: word2vec, gensim

ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.

Stars: ✭ 23 (-94.6%)

Mutual labels: word2vec, gensim

Log Anomaly Detector

Log Anomaly Detection - Machine learning to detect abnormal events logs

Stars: ✭ 169 (-60.33%)

Mutual labels: word2vec, gensim

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (-92.96%)

Mutual labels: word2vec, gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+2896.01%)

Mutual labels: word2vec, gensim

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (-53.99%)

Mutual labels: word2vec, gensim

Web-ify your word2vec: framework to serve distributional semantic models online

Stars: ✭ 154 (-63.85%)

Mutual labels: word2vec, gensim

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-94.84%)

Mutual labels: word2vec, gensim

A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).

Stars: ✭ 134 (-68.54%)

Mutual labels: word2vec, gensim

Turkish Word2vec

Pre-trained Word2Vec Model for Turkish

Stars: ✭ 136 (-68.08%)

Mutual labels: word2vec, gensim

document embedding and machine learning script for beginners

Stars: ✭ 92 (-78.4%)

Mutual labels: word2vec, gensim

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br

Stars: ✭ 34 (-92.02%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

使用 gensim 訓練中文詞向量

教學文件

套件需求

jieba

pip3 install jieba

gensim

pip3 install -U gensim

OpenCC (可更換為任何繁簡轉換套件)

訓練流程

1.取得中文維基數據，本次實驗是採用 2016/8/20 的資料。

目前 8 月 20 號的備份已經被汰換掉囉，請前往維基百科:資料庫下載按日期來挑選更新的訓練資料。( 請挑選以pages-articles.xml.bz2為結尾的檔案 )

2.將下載後的維基數據置於與專案同個目錄，再使用wiki_to_txt.py從 xml 中提取出維基文章

python3 wiki_to_txt.py zhwiki-20160820-pages-articles.xml.bz2

若您採用的不是 8 月 20 號的備份，請更換 zhwiki-20160820-pages-articles.xml.bz2 為您採用的備份的檔名。

3.使用 OpenCC 將維基文章統一轉換為繁體中文

opencc -i wiki_texts.txt -o wiki_zh_tw.txt -c s2tw.json

4.使用jieba 對文本斷詞，並去除停用詞

python3 segment.py

5.使用gensim 的 word2vec 模型進行訓練

python3 train.py

6.測試我們訓練出的模型

python3 demo.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 426

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗