All Projects → felipeparpinelli → word2vec-pt-br

felipeparpinelli / word2vec-pt-br

Licence: Apache-2.0 license
Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br

Programming Languages

CSS
56736 projects
HTML
75241 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to word2vec-pt-br

Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+37438.24%)
Mutual labels:  word2vec, gensim
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+476.47%)
Mutual labels:  word2vec, gensim
Log Anomaly Detector
Log Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (+397.06%)
Mutual labels:  word2vec, gensim
walklets
A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).
Stars: ✭ 94 (+176.47%)
Mutual labels:  word2vec, gensim
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-35.29%)
Mutual labels:  word2vec, gensim
Webvectors
Web-ify your word2vec: framework to serve distributional semantic models online
Stars: ✭ 154 (+352.94%)
Mutual labels:  word2vec, gensim
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+455.88%)
Mutual labels:  word2vec, gensim
Role2vec
A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
Stars: ✭ 134 (+294.12%)
Mutual labels:  word2vec, gensim
uoj-potigol
Soluções dos problemas do Beecrowd usando a linguagem Potigol
Stars: ✭ 45 (+32.35%)
Mutual labels:  portugues, portuguese
Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+602.94%)
Mutual labels:  word2vec, gensim
RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (+52.94%)
Mutual labels:  word2vec, gensim
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+170.59%)
Mutual labels:  word2vec, gensim
Wordembeddings Elmo Fasttext Word2vec
Using pre trained word embeddings (Fasttext, Word2Vec)
Stars: ✭ 146 (+329.41%)
Mutual labels:  word2vec, gensim
biovec
ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.
Stars: ✭ 23 (-32.35%)
Mutual labels:  word2vec, gensim
Turkish Word2vec
Pre-trained Word2Vec Model for Turkish
Stars: ✭ 136 (+300%)
Mutual labels:  word2vec, gensim
Splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (+420.59%)
Mutual labels:  word2vec, gensim
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+4000%)
Mutual labels:  word2vec, gensim
Ml Projects
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (+273.53%)
Mutual labels:  word2vec, gensim
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+517.65%)
Mutual labels:  word2vec, gensim
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (+32.35%)
Mutual labels:  word2vec, gensim

word2vec-pt-br

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br utilizando gensim.

O modelo treinado com a wiki-pt encontra-se disponível para download em: https://drive.google.com/file/d/0B_eXEo_eUPCDWnJ0YWtUdW1kVFk/view?usp=sharing

  • Para demonstrar foi construído um exemplo usando um webserver em python (Flask) e uma visualização em grafos usando o D3.js.

Rodando o exemplo de teste:

  1. O modelo baixado acima, deve estar no mesmo diretório da pasta exemplo.

  2. Crie uma virtualenv e instale as dependências de requirements.txt

    pip install -r requirements.txt
  1. Inicie o servidor python
    python app.py
  1. Acesse o servidor
    127.0.0.1:5000

Observações:

  1. Como o modelo foi treinado com toda a base da wikipedia, é importante ter disponível pelo menos 2 GB livres de ram. Dependendo do espaço disponível, pode ser notada uma pequena lentidão no 'start' do servidor e na primeira 'consulta' por itens similares.

  2. É possível alterar o app.py para usar outros métodos disponíveis na API do Gensim, detalhes em: https://radimrehurek.com/gensim/models/word2vec.html

Screenshots:

  • Exemplo de busca por palavras semanticamente similares.

alt tag

  • Exemplo utilizando um trigrama.

alt tag

  • Exemplo utilizando operações matemáticas nas palavras.

alt tag

  • Exemplo retornando a palavra mais distante dado um conjunto de palavras.

alt tag

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].