Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → akoksal → Turkish Word2vec

akoksal / Turkish Word2vec

Licence: mit

Pre-trained Word2Vec Model for Turkish

Programming Languages

139335 projects - #7 most used programming language

Labels

nlp word2vec gensim

Projects that are alternatives of or similar to Turkish Word2vec

wordfish-python

extract relationships from standardized terms from corpus of interest with deep learning 🐟

Stars: ✭ 19 (-86.03%)

Mutual labels: word2vec, gensim

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (-6.62%)

Mutual labels: word2vec, gensim

Lmdb Embeddings

Fast word vectors with little memory usage in Python

Stars: ✭ 404 (+197.06%)

Mutual labels: word2vec, gensim

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (-61.76%)

Mutual labels: word2vec, gensim

🦆 Contextually-keyed word vectors

Stars: ✭ 1,184 (+770.59%)

Mutual labels: word2vec, gensim

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br

Stars: ✭ 34 (-75%)

Mutual labels: word2vec, gensim

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+480.88%)

Mutual labels: word2vec, gensim

document embedding and machine learning script for beginners

Stars: ✭ 92 (-32.35%)

Mutual labels: word2vec, gensim

訓練中文詞向量 Word2vec, Word2vec was created by a team of researchers led by Tomas Mikolov at Google.

Stars: ✭ 48 (-64.71%)

Mutual labels: word2vec, gensim

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-68.38%)

Mutual labels: word2vec, gensim

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+925%)

Mutual labels: word2vec, gensim

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+848.53%)

Mutual labels: word2vec, gensim

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (-30.88%)

Mutual labels: word2vec, gensim

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (-77.94%)

Mutual labels: word2vec, gensim

ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.

Stars: ✭ 23 (-83.09%)

Mutual labels: word2vec, gensim

Word2vec Tutorial

中文詞向量訓練教學

Stars: ✭ 426 (+213.24%)

Mutual labels: word2vec, gensim

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-83.82%)

Mutual labels: word2vec, gensim

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (-66.91%)

Mutual labels: word2vec, gensim

Twitter sentiment analysis word2vec convnet

Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Network

Stars: ✭ 24 (-82.35%)

Mutual labels: word2vec, gensim

The reference implementation of "Multi-scale Attributed Node Embedding".

Stars: ✭ 75 (-44.85%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

Turkish Pre-trained Word2Vec Model

(Turkish version is below. / Türkçe için aşağıya bakın.)

This tutorial introduces how to train word2vec model for Turkish language from Wikipedia dump. This code is written in Python 3 by using gensim library. Turkish is an agglutinative language and there are many words with the same lemma and different suffixes in the wikipedia corpus. I will write Turkish lemmatizer to increase quality of the model.

You can checkout wiki-page for more details. If you just want to download the pretrained model you can use this link and you can look for examples in 5. Using Word2Vec Model and Examples page in github wiki. Some of them are below:

word_vectors.most_similar(positive=["kral","kadın"],negative=["erkek"])

This is a classic example for word2vec. The most similar word vector for king+woman-man is queen as expected. Second one is "of king(kralı)", third one is "king's(kralın)". If the model was trained with lemmatization tool for Turkish language, the results would be more clear.

word_vectors.most_similar(positive=["geliyor","gitmek"],negative=["gelmek"])

Turkish is an aggluginative language. I have investigated this property. I analyzed most similar vector for +geliyor(he/she/it is coming)-gelmek(to come)+gitmek(to go). Most similar vector is gidiyor(he/she/it is going) as expected. Second one is "I am going". Third one is "lets go". So, we can see effects of tense and possesive suffixes in word2vec models.

Eğitilmiş Türkçe Word2Vec Modeli

Bu çalışma Wikipedia'daki Türkçe makalelerden Türkçe word2vec modelinin nasıl çıkarılabileceğini anlatmak için yapılmıştır. Kod gensim kütüphanesi kullanılarak Python 3 ile yazılmıştır. Gelecek zamanlarda, Türkçe "lemmatization" algoritmasıyla aynı kök ve yapım ekleri fakat farklı çekim eklerine sahip kelimelerin aynı kelimeye işaret etmesi sağlanarak modelin kalitesi arttırılacaktır.

Ayrıntılar için github wiki sayfasını ziyaret edebilirsiniz. Eğer sadece eğitilmiş modeli kullanmak isterseniz buradan indirebilirsiniz. Aynı zamanda örneklere bakmak için github wikisinde bulunan 5. Word2Vec Modelini Kullanmak/Örnekler sayfasına bakabilirsiniz. Bazı örnekler aşağıda mevcuttur:

word_vectors.most_similar(positive=["kral","kadın"],negative=["erkek"])

Bu word2vec için klasik bir örnektir. Kral kelime vektöründen erkek kelime vektörü çıkarılıp kadın eklendiğinde en yakın kelime vektörü kraliçe oluyor. Benzerlerin bir çoğu da kral ve kraliçenin ek almış halleri oluyor. Türkçe sondan eklemeli bir dil olduğu için bazı sonuçlar beklenildiği gibi çıkmayabiliyor. Eğer word2vec'i kelimelerin lemmalarını bularak eğitebilseydik, çok daha temiz sonuçlar elde edebilirdik.

word_vectors.most_similar(positive=["geliyor","gitmek"],negative=["gelmek"])

Bu örnekte ise filler için zaman eklerinin etkisini inceledik. En benzer kelime vektörleri beklenen sonuç ile alakalı çıktı.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 136

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗