Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yumeng5 → Spherical Text Embedding

yumeng5 / Spherical Text Embedding

Licence: apache-2.0

[NeurIPS 2019] Spherical Text Embedding

Programming Languages

50402 projects - #5 most used programming language

Labels

unsupervised-learning word-embeddings

Projects that are alternatives of or similar to Spherical Text Embedding

Text Summarizer

Python Framework for Extractive Text Summarization

Stars: ✭ 96 (-32.87%)

Mutual labels: unsupervised-learning, word-embeddings

Awesome Sentence Embedding

A curated list of pretrained sentence and word embedding models

Stars: ✭ 1,973 (+1279.72%)

Mutual labels: unsupervised-learning, word-embeddings

3dpose gan

The authors' implementation of Unsupervised Adversarial Learning of 3D Human Pose from 2D Joint Locations

Stars: ✭ 124 (-13.29%)

Mutual labels: unsupervised-learning

Isolation Forest

A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.

Stars: ✭ 139 (-2.8%)

Mutual labels: unsupervised-learning

Awesome Community Detection

A curated list of community detection research papers with implementations.

Stars: ✭ 1,874 (+1210.49%)

Mutual labels: unsupervised-learning

Hash Embeddings

PyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.

Stars: ✭ 126 (-11.89%)

Mutual labels: word-embeddings

Arflow

The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".

Stars: ✭ 134 (-6.29%)

Mutual labels: unsupervised-learning

Sfmlearner

An unsupervised learning framework for depth and ego-motion estimation from monocular videos

Stars: ✭ 1,661 (+1061.54%)

Mutual labels: unsupervised-learning

Deepmapping

code/webpage for the DeepMapping project

Stars: ✭ 140 (-2.1%)

Mutual labels: unsupervised-learning

E3d lstm

e3d-lstm; Eidetic 3D LSTM A Model for Video Prediction and Beyond

Stars: ✭ 129 (-9.79%)

Mutual labels: unsupervised-learning

Splitbrainauto

Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction. In CVPR, 2017.

Stars: ✭ 137 (-4.2%)

Mutual labels: unsupervised-learning

Fasttext.js

FastText for Node.js

Stars: ✭ 127 (-11.19%)

Mutual labels: word-embeddings

Tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data

Stars: ✭ 126 (-11.89%)

Mutual labels: unsupervised-learning

Oneshottranslation

Pytorch implementation of "One-Shot Unsupervised Cross Domain Translation" NIPS 2018

Stars: ✭ 135 (-5.59%)

Mutual labels: unsupervised-learning

Gon

Gradient Origin Networks - a new type of generative model that is able to quickly learn a latent representation without an encoder

Stars: ✭ 126 (-11.89%)

Mutual labels: unsupervised-learning

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-2.1%)

Mutual labels: unsupervised-learning

Cleanlab

The standard package for machine learning with noisy labels, finding mislabeled data, and uncertainty quantification. Works with most datasets and models.

Stars: ✭ 2,526 (+1666.43%)

Mutual labels: unsupervised-learning

Deepco3

[CVPR19] DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency (Oral paper)

Stars: ✭ 127 (-11.19%)

Mutual labels: unsupervised-learning

And

Official Pytorch Implementation for ICML'19 paper: Unsupervised Deep Learning by Neighbourhood Discovery

Stars: ✭ 133 (-6.99%)

Mutual labels: unsupervised-learning

Flappy Es

Flappy Bird AI using Evolution Strategies

Stars: ✭ 140 (-2.1%)

Mutual labels: unsupervised-learning

View All Similar Projects ➔

Spherical Text Embedding

The source code used for Spherical Text Embedding, published in NeurIPS 2019. The code structure (especially file reading and saving functions) is adapted from the Word2Vec implementation.

Requirements

GCC compiler (used to compile the source c file): See the guide for installing GCC.

Pre-trained Embeddings

We provide pre-trained JoSE embeddings on the wikipedia dump.

Unlike Euclidean embeddings such as Word2Vec and GloVe, spherical embeddings do not necessarily benefit from higher-dimensional space, so it might be a good idea to start with lower-dimensional ones first.

Run the Code

We provide a shell script run.sh for compiling the source file and training embedding.

Note: When preparing the training text corpus, make sure each line in the file is one document/paragraph.

Hyperparameters

Note: It is recommended to use the default hyperparameters, especially the number of negative samples (-negative) and loss function margin (-margin).

Invoke the command without arguments for a list of hyperparameters and their meanings:

$ ./src/jose
Parameters:
        -train <file> (mandatory argument)
                Use text data from <file> to train the model
        -word-output <file>
                Use <file> to save the resulting word vectors
        -context-output <file>
                Use <file> to save the resulting word context vectors
        -doc-output <file>
                Use <file> to save the resulting document vectors
        -size <int>
                Set size of word vectors; default is 100
        -window <int>
                Set max skip length between words; default is 5
        -sample <float>
                Set threshold for occurrence of words. Those that appear with higher frequency in the
                training data will be randomly down-sampled; default is 1e-3, useful range is (0, 1e-3)
        -negative <int>
                Number of negative examples; default is 2
        -threads <int>
                Use <int> threads; default is 20
        -margin <float>
                Margin used in loss function to separate positive samples from negative samples; default is 0.15
        -iter <int>
                Run more training iterations; default is 10
        -min-count <int>
                This will discard words that appear less than <int> times; default is 5
        -alpha <float>
                Set the starting learning rate; default is 0.04
        -debug <int>
                Set the debug mode (default = 2 = more info during training)
        -save-vocab <file>
                The vocabulary will be saved to <file>
        -read-vocab <file>
                The vocabulary will be read from <file>, not constructed from the training data
        -load-emb <file>
                The pretrained embeddings will be read from <file>

Examples:
./jose -train text.txt -word-output jose.txt -size 100 -margin 0.15 -window 5 -sample 1e-3 -negative 2 -iter 10

Word Similarity Evaluation

We provide a shell script eval_sim.sh for word similarity evaluation of trained spherical word embeddings on the wikipedia dump. The script will first download a zipped file of the pre-processed wikipedia dump (retrieved 2019.05; the zipped version is of ~4GB; the unzipped one is of ~13GB; for a detailed description of the dataset, see its README file), and then run JoSE on it. Finally, the trained embeddings are evaluated on three benchmark word similarity datasets: WordSim-353, MEN and SimLex-999.

Document Clustering Evaluation

We provide a shell script eval_cluster.sh for document clustering evaluation of trained spherical document embeddings on the 20 Newsgroup dataset. The script will perform K-Means and Spherical K-Means clustering on the trained document embeddings.

Document Classification Evaluation

We provide a shell script eval_classify.sh for document classification evaluation of trained spherical document embeddings on the 20 Newsgroup dataset. The script will perform KNN classification following the original 20 Newsgroup train/test split with the trained document embeddings as features.

Citations

Please cite the following paper if you find the code helpful for your research.

@inproceedings{meng2019spherical,
  title={Spherical Text Embedding},
  author={Meng, Yu and Huang, Jiaxin and Wang, Guangyuan and Zhang, Chao and Zhuang, Honglei and Kaplan, Lance and Han, Jiawei},
  booktitle={Advances in neural information processing systems},
  year={2019}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 143

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗