All Projects → YaleDHLab → wordmap

YaleDHLab / wordmap

Licence: MIT license
Visualize large text collections with WebGL

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to wordmap

russe
RUSSE: Russian Semantic Evaluation.
Stars: ✭ 11 (-52.17%)
Mutual labels:  word2vec
two-stream-cnn
A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data
Stars: ✭ 24 (+4.35%)
Mutual labels:  word2vec
doc2vec-api
document embedding and machine learning script for beginners
Stars: ✭ 92 (+300%)
Mutual labels:  word2vec
grad-cam-text
Implementation of Grad-CAM for text.
Stars: ✭ 37 (+60.87%)
Mutual labels:  word2vec
Word2Vec-iOS
Word2Vec iOS port
Stars: ✭ 23 (+0%)
Mutual labels:  word2vec
GE-FSG
Graph Embedding via Frequent Subgraphs
Stars: ✭ 39 (+69.57%)
Mutual labels:  word2vec
Movietaster Open
A practical movie recommend project based on Item2vec.
Stars: ✭ 253 (+1000%)
Mutual labels:  word2vec
acl2017 document clustering
code for "Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering" ACL 2017
Stars: ✭ 21 (-8.7%)
Mutual labels:  word2vec
Recommendation-based-on-sequence-
Recommendation based on sequence
Stars: ✭ 23 (+0%)
Mutual labels:  word2vec
asm2vec
An unofficial implementation of asm2vec as a standalone python package
Stars: ✭ 127 (+452.17%)
Mutual labels:  word2vec
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-4.35%)
Mutual labels:  word2vec
Word2VecAndTsne
Scripts demo-ing how to train a Word2Vec model and reduce its vector space
Stars: ✭ 45 (+95.65%)
Mutual labels:  word2vec
hyperstar
Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.
Stars: ✭ 24 (+4.35%)
Mutual labels:  word2vec
Simple-Sentence-Similarity
Exploring the simple sentence similarity measurements using word embeddings
Stars: ✭ 99 (+330.43%)
Mutual labels:  word2vec
Word-Embeddings-and-Document-Vectors
An evaluation of word-embeddings for classification
Stars: ✭ 32 (+39.13%)
Mutual labels:  word2vec
Cukatify
Cukatify is a music social media project
Stars: ✭ 21 (-8.7%)
Mutual labels:  word2vec
skip-gram-Chinese
skip-gram for Chinese word2vec base on tensorflow
Stars: ✭ 20 (-13.04%)
Mutual labels:  word2vec
receiptdID
Receipt.ID is a multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network.
Stars: ✭ 22 (-4.35%)
Mutual labels:  word2vec
stackoverflow-semantic-search
Word2Vec encodings based search engine for Stackoverflow questions
Stars: ✭ 23 (+0%)
Mutual labels:  word2vec
word2vec-movies
Bag of words meets bags of popcorn in Python 3 中文教程
Stars: ✭ 54 (+134.78%)
Mutual labels:  word2vec

Wordmap

Visualize large collections of text data with WebGL

App preview

Installation

pip install wordmap

Basic Usage

To create a visualization from a directory of text files, you can call wordmap as follows:

wordmap --texts "data/*.txt"

That process creates a visualization in ./web that can be viewed if you start a local web server:

# python 2
python -m SimpleHTTPServer 7090

# python 3
python -m http.server 7090

After starting the web server, navigate to http://localhost:7090/web/ to view the visualization.

Command Line Arguments

The following flags can be passed to the wordmap command. Type --help to see the full list:

--texts A glob of files to process

--encoding The encoding of input files

--max_n The maximum number of words/docs to include in the visualization

--layouts The layouts to render {umap, tsne, grid, img, obj}

--obj_file An .obj file that should be used to create the obj layout

--img_file A .png or .jpg file that should be used to create the img layout

--n_components The number of dimensions to use when creating the layouts

--tsne_perplexity The perplexity value to use when creating TSNE layout

--umap_n_neighbors The n_neighbors value to use when creating UMAP layout

--umap_min_distance The min_distance value to use when creating the UMAP layout

--model_type The model type to use {word2vec}

--use_cache Boolean that, if True, will load saved layouts from models

--model_name The name to use when saving a model to disk

--model A persisted model to use to create layouts

--size The number of dimensions to include in Word2Vec vectors

--window The number of words to include in windows when creating a Word2Vec model

--iter The maximum number of iterations to run the created model

--min_count The minimum occurrences of each word to be included in the Word2Vec model

--workers The number of computer cores to use when processing input data

--verbose If true, logs progress during layout construction

Examples:

Create a wordmap of the text files in ./data using the umap, tsne, and grid layouts:

wordmap --texts "data/*.txt" \
  --layouts umap tsne grid

Create a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:

wordmap --model "1563222036.model" \
  --n_components 3 \
  --max_n 10000

Create a wordmap with several layouts, each with multiple parameter steps:

python wordmap/wordmap.py \
  --texts "data/philosophical_transactions/*.txt" \
  --layouts tsne umap grid \
  --tsne_perplexity 5 25 100 \
  --umap_n_neighbors 2 20 200 \
  --umap_min_dist 0.01 0.1 1.0 \
  --n_clusters 10 25 \
  --iter 100
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].