Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

中文长文本分类、短句子分类、多标签分类、两句子相似度（Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short），字词句向量嵌入层（embeddings）和网络层（graph）构建基类，FastText，TextCNN，CharCNN，TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

Stars: ✭ 914 (+2185%)

Mutual labels: embeddings, fasttext

navec

Compact high quality word embeddings for Russian language

Stars: ✭ 118 (+195%)

Mutual labels: embeddings, glove

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (+47.5%)

Mutual labels: word-embeddings, embeddings

compress-fasttext

Tools for shrinking fastText models (in gensim format)

Stars: ✭ 124 (+210%)

Mutual labels: word-embeddings, fasttext

NLP-paper

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-42.5%)

Mutual labels: glove, fasttext

lda2vec

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec from this paper https://arxiv.org/abs/1605.02019

Stars: ✭ 27 (-32.5%)

Mutual labels: word-embeddings, embeddings

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (+27.5%)

Mutual labels: word-embeddings, embeddings

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (-20%)

Mutual labels: word-embeddings, embeddings

PersianNER

Named-Entity Recognition in Persian Language

Stars: ✭ 48 (+20%)

Mutual labels: word-embeddings, embeddings

Biosentvec

BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

Stars: ✭ 308 (+670%)

Mutual labels: word-embeddings, fasttext

Persian-Sentiment-Analyzer

Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )

Stars: ✭ 30 (-25%)

Mutual labels: embeddings, fasttext

Wego

Word Embeddings (e.g. Word2Vec) in Go!

Stars: ✭ 336 (+740%)

Mutual labels: word-embeddings, glove

View All Similar Projects ➔

Embeddings Visualizer in TensorBoard

Problem

Suppose you have a large word embeddings file at hand (e.g. GloVe) and that you want to visualize these embeddings in TensorBoard. The problem is that TensorBoard becomes very slow at doing this task as the number of total words exceeds tens of thousands, especially that it does computations in the browser. Hence, the way to go is to limit your vocabulary to subset of words that are of interest to you and visualize their neighbors only. This repository aims to automate this task. You input a set of vocabulary terms of interest in addition to your embeddings. Then you can visualize these words and their neighbors within TensorBoard.

The repository uses Faiss library from Facebook in addition to the latest TensorFlow from Google. It supports including multiple embeddings in the same TensorBoard session.

It is tested on with TensorFlow 1.2.1 under Python 2.7 (It is more straightforward to install Faiss with Python 2.7).

Prerequisites Setup

Install Faiss, Facebook's library for efficient similarity search, by following their guide

For example, on Ubuntu 14 (CPU installation), I followed the below steps:

# Clone faiss
git clone https://github.com/facebookresearch/faiss.git
cd faiss
# copy the make file
cp example_makefiles/makefile.inc.Linux ./makefile.inc
#  Uncomment the part for your system in makefile.inc and apply the commands. E.g. for Ubuntu 14, I applied `sudo apt-get install libopenblas-dev liblapack3 python-numpy python-dev` and uncommented the line starting with BLASLDFLAGS
vi ./makefile.inc
# for the cpu installation:
make tests/test_blas
make
make py

Create the python virtual environment in order to install the project prerequisites there, without affecting the rest of your python environment. I executed the below commands. You might need to install the virtual environment using sudo apt-get install python-pip python-dev python-virtualenv. If you use Anaconda, you can do the corresponding steps there.
```
virtualenv --system-site-packages venv_dir
source venv_dir/bin/activate
```
Add Faiss to the python path to use it, e.g., if the directory is FAISS_DIRECTORY, you can issue:
```
export PYTHONPATH=FAISS_DIRECTORY:$PYTHONPATH
```
Install the rest of the dependencies (basicall tensorflow and numpy):
```
pip install --upgrade pip
pip install -r requirements.txt
```

Running the Code

The first step is to obtain the embeddings of the vocabulary we have and their neighbors. For that, we run:
```
cd embeddingsviz
python embeddings_knn.py -e ORIGINAL_EMBEDDINGS_FILE -v VOCAB_TXT_FILE -o OUTPUT_EMBEDDINGS_FILE -k NUM_NEIGHBORS
# e.g.: python embeddings_knn.py -e ~/data/fasttext.vec -v ./vocab_file.txt -o ./fasttext_subset_1.vec -k 100
```
The ORIGINAL_EMBEDDINGS_FILE is assumed to be of the following format. The first line is a header setting the vocabulary size and the embeddings dimension. This is the format used in fastText.
```
VOCAB_SIZE EMBEDDING_DIMENSIONS
word_1 vec_1
word_2 vec_2
```
However, the code will also work with another format which does not has a header (e.g., the default GloVe format).

This step has to be executed for each embeddings file you want. The VOCAB_TXT_FILE has one word per line. NUM_NEIGHBORS has to be chosen so that the total number of words in the vocab and their neighbors is not very large (e.g., they should add up to ~10,000 words).

The second step is to convert the resulting embeddings of your vocab and their neighbors into a format that TensorBoard understands and place them in the log directory:

python embeddings_formatter.py -l LOGS_DIRECTORY  -f EMBEDDINGS_FILE_1  EMBEDDINGS_FILE_2  -n NAME_1 NAME_2
# e.g.: python embeddings_formatter.py -l logs  -f ./fasttext_subset_1.vec ./fasttext_subset_2.vec -n subset_1 subset_2

The final step is to run TensorBoard, pointing it to this directory:
```
tensorboard --logdir=logs --port=6006
```
Now you can point your browser to the embeddings visualization, e.g. http://server_address:6006/#embeddings. You will see an interface like the following:

Developer

Hamza Harkous

License

MIT

References:

https://www.tensorflow.org/get_started/embedding_viz

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 40

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗