felipeparpinelli / word2vec-pt-br

Licence: Apache-2.0 license

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br

Programming Languages

CSS

56736 projects

HTML

75241 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to word2vec-pt-br

Gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+37438.24%)

Mutual labels: word2vec, gensim

Shallowlearn

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+476.47%)

Mutual labels: word2vec, gensim

Log Anomaly Detector

Log Anomaly Detection - Machine learning to detect abnormal events logs

Stars: ✭ 169 (+397.06%)

Mutual labels: word2vec, gensim

walklets

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (+176.47%)

Mutual labels: word2vec, gensim

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-35.29%)

Mutual labels: word2vec, gensim

Webvectors

Web-ify your word2vec: framework to serve distributional semantic models online

Stars: ✭ 154 (+352.94%)

Mutual labels: word2vec, gensim

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (+455.88%)

Mutual labels: word2vec, gensim

Role2vec

A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).

Stars: ✭ 134 (+294.12%)

Mutual labels: word2vec, gensim

uoj-potigol

Soluções dos problemas do Beecrowd usando a linguagem Potigol

Stars: ✭ 45 (+32.35%)

Mutual labels: portugues, portuguese

Aravec

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Stars: ✭ 239 (+602.94%)

Mutual labels: word2vec, gensim

RolX

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (+52.94%)

Mutual labels: word2vec, gensim

doc2vec-api

document embedding and machine learning script for beginners

Stars: ✭ 92 (+170.59%)

Mutual labels: word2vec, gensim

Wordembeddings Elmo Fasttext Word2vec

Using pre trained word embeddings (Fasttext, Word2Vec)

Stars: ✭ 146 (+329.41%)

Mutual labels: word2vec, gensim

biovec

ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.

Stars: ✭ 23 (-32.35%)

Mutual labels: word2vec, gensim

Turkish Word2vec

Pre-trained Word2Vec Model for Turkish

Stars: ✭ 136 (+300%)

Mutual labels: word2vec, gensim

Splitter

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Stars: ✭ 177 (+420.59%)

Mutual labels: word2vec, gensim

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+4000%)

Mutual labels: word2vec, gensim

Ml Projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (+273.53%)

Mutual labels: word2vec, gensim

Gemsec

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (+517.65%)

Mutual labels: word2vec, gensim

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (+32.35%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

word2vec-pt-br

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br utilizando gensim.

O modelo treinado com a wiki-pt encontra-se disponível para download em: https://drive.google.com/file/d/0B_eXEo_eUPCDWnJ0YWtUdW1kVFk/view?usp=sharing

Para demonstrar foi construído um exemplo usando um webserver em python (Flask) e uma visualização em grafos usando o D3.js.

Rodando o exemplo de teste:

O modelo baixado acima, deve estar no mesmo diretório da pasta exemplo.
Crie uma virtualenv e instale as dependências de requirements.txt

    pip install -r requirements.txt

Inicie o servidor python

    python app.py

Acesse o servidor

    127.0.0.1:5000

Observações:

Como o modelo foi treinado com toda a base da wikipedia, é importante ter disponível pelo menos 2 GB livres de ram. Dependendo do espaço disponível, pode ser notada uma pequena lentidão no 'start' do servidor e na primeira 'consulta' por itens similares.
É possível alterar o app.py para usar outros métodos disponíveis na API do Gensim, detalhes em: https://radimrehurek.com/gensim/models/word2vec.html

Screenshots:

Exemplo de busca por palavras semanticamente similares.

Exemplo utilizando um trigrama.

Exemplo utilizando operações matemáticas nas palavras.

Exemplo retornando a palavra mais distante dado um conjunto de palavras.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

felipeparpinelli / word2vec-pt-br

Programming Languages

Labels

Projects that are alternatives of or similar to word2vec-pt-br

word2vec-pt-br

Rodando o exemplo de teste:

Observações:

Screenshots: