YaleDHLab / wordmap

Licence: MIT license

Visualize large text collections with WebGL

Programming Languages

javascript

184084 projects - #8 most used programming language

python

139335 projects - #7 most used programming language

HTML

75241 projects

Projects that are alternatives of or similar to wordmap

russe

RUSSE: Russian Semantic Evaluation.

Stars: ✭ 11 (-52.17%)

Mutual labels: word2vec

two-stream-cnn

A two-stream convolutional neural network for learning abitrary similarity functions over two sets of training data

Stars: ✭ 24 (+4.35%)

Mutual labels: word2vec

doc2vec-api

document embedding and machine learning script for beginners

Stars: ✭ 92 (+300%)

Mutual labels: word2vec

grad-cam-text

Implementation of Grad-CAM for text.

Stars: ✭ 37 (+60.87%)

Mutual labels: word2vec

Word2Vec-iOS

Word2Vec iOS port

Stars: ✭ 23 (+0%)

Mutual labels: word2vec

GE-FSG

Graph Embedding via Frequent Subgraphs

Stars: ✭ 39 (+69.57%)

Mutual labels: word2vec

Movietaster Open

A practical movie recommend project based on Item2vec.

Stars: ✭ 253 (+1000%)

Mutual labels: word2vec

acl2017 document clustering

code for "Determining Gains Acquired from Word Embedding Quantitatively Using Discrete Distribution Clustering" ACL 2017

Stars: ✭ 21 (-8.7%)

Mutual labels: word2vec

Recommendation-based-on-sequence-

Recommendation based on sequence

Stars: ✭ 23 (+0%)

Mutual labels: word2vec

asm2vec

An unofficial implementation of asm2vec as a standalone python package

Stars: ✭ 127 (+452.17%)

Mutual labels: word2vec

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-4.35%)

Mutual labels: word2vec

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (+95.65%)

Mutual labels: word2vec

hyperstar

Hyperstar: Negative Sampling Improves Hypernymy Extraction Based on Projection Learning.

Stars: ✭ 24 (+4.35%)

Mutual labels: word2vec

Simple-Sentence-Similarity

Exploring the simple sentence similarity measurements using word embeddings

Stars: ✭ 99 (+330.43%)

Mutual labels: word2vec

Word-Embeddings-and-Document-Vectors

An evaluation of word-embeddings for classification

Stars: ✭ 32 (+39.13%)

Mutual labels: word2vec

Cukatify

Cukatify is a music social media project

Stars: ✭ 21 (-8.7%)

Mutual labels: word2vec

skip-gram-Chinese

skip-gram for Chinese word2vec base on tensorflow

Stars: ✭ 20 (-13.04%)

Mutual labels: word2vec

receiptdID

Receipt.ID is a multi-label, multi-class, hierarchical classification system implemented in a two layer feed forward network.

Stars: ✭ 22 (-4.35%)

Mutual labels: word2vec

stackoverflow-semantic-search

Word2Vec encodings based search engine for Stackoverflow questions

Stars: ✭ 23 (+0%)

Mutual labels: word2vec

word2vec-movies

Bag of words meets bags of popcorn in Python 3 中文教程

Stars: ✭ 54 (+134.78%)

Mutual labels: word2vec

View All Similar Projects ➔

Wordmap

Visualize large collections of text data with WebGL

Installation

pip install wordmap

Basic Usage

To create a visualization from a directory of text files, you can call wordmap as follows:

wordmap --texts "data/*.txt"

That process creates a visualization in ./web that can be viewed if you start a local web server:

# python 2
python -m SimpleHTTPServer 7090

# python 3
python -m http.server 7090

After starting the web server, navigate to http://localhost:7090/web/ to view the visualization.

Command Line Arguments

The following flags can be passed to the wordmap command. Type --help to see the full list:

--texts A glob of files to process

--encoding The encoding of input files

--max_n The maximum number of words/docs to include in the visualization

--layouts The layouts to render {umap, tsne, grid, img, obj}

--obj_file An .obj file that should be used to create the obj layout

--img_file A .png or .jpg file that should be used to create the img layout

--n_components The number of dimensions to use when creating the layouts

--tsne_perplexity The perplexity value to use when creating TSNE layout

--umap_n_neighbors The n_neighbors value to use when creating UMAP layout

--umap_min_distance The min_distance value to use when creating the UMAP layout

--model_type The model type to use {word2vec}

--use_cache Boolean that, if True, will load saved layouts from models

--model_name The name to use when saving a model to disk

--model A persisted model to use to create layouts

--size The number of dimensions to include in Word2Vec vectors

--window The number of words to include in windows when creating a Word2Vec model

--iter The maximum number of iterations to run the created model

--min_count The minimum occurrences of each word to be included in the Word2Vec model

--workers The number of computer cores to use when processing input data

--verbose If true, logs progress during layout construction

Examples:

Create a wordmap of the text files in ./data using the umap, tsne, and grid layouts:

wordmap --texts "data/*.txt" \
  --layouts umap tsne grid

Create a wordmap using a saved Word2Vec model with 3 dimsions and a maximum of 10000 words:

wordmap --model "1563222036.model" \
  --n_components 3 \
  --max_n 10000

Create a wordmap with several layouts, each with multiple parameter steps:

python wordmap/wordmap.py \
  --texts "data/philosophical_transactions/*.txt" \
  --layouts tsne umap grid \
  --tsne_perplexity 5 25 100 \
  --umap_n_neighbors 2 20 200 \
  --umap_min_dist 0.01 0.1 1.0 \
  --n_clusters 10 25 \
  --iter 100

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

YaleDHLab / wordmap

Programming Languages

Labels

Projects that are alternatives of or similar to wordmap

Wordmap

Installation

Basic Usage

Command Line Arguments