All Projects → skesiraju → BaySMM

skesiraju / BaySMM

Licence: other
Model for learning document embeddings along with their uncertainties

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to BaySMM

adenine
ADENINE: A Data ExploratioN PipelINE
Stars: ✭ 15 (-40%)
Mutual labels:  unsupervised-learning
VBLinLogit
Variational Bayes linear and logistic regression
Stars: ✭ 25 (+0%)
Mutual labels:  variational-bayes
Discovery
Mining Discourse Markers for Unsupervised Sentence Representation Learning
Stars: ✭ 48 (+92%)
Mutual labels:  unsupervised-learning
kmeans
A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python
Stars: ✭ 18 (-28%)
Mutual labels:  unsupervised-learning
SimCLR-in-TensorFlow-2
(Minimally) implements SimCLR (https://arxiv.org/abs/2002.05709) in TensorFlow 2.
Stars: ✭ 75 (+200%)
Mutual labels:  unsupervised-learning
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-32%)
Mutual labels:  unsupervised-learning
PlanSum
[AAAI2021] Unsupervised Opinion Summarization with Content Planning
Stars: ✭ 25 (+0%)
Mutual labels:  unsupervised-learning
KD3A
Here is the official implementation of the model KD3A in paper "KD3A: Unsupervised Multi-Source Decentralized Domain Adaptation via Knowledge Distillation".
Stars: ✭ 63 (+152%)
Mutual labels:  unsupervised-learning
Deep-Association-Learning
Tensorflow Implementation on Paper [BMVC2018]Deep Association Learning for Unsupervised Video Person Re-identification
Stars: ✭ 68 (+172%)
Mutual labels:  unsupervised-learning
awesome-contrastive-self-supervised-learning
A comprehensive list of awesome contrastive self-supervised learning papers.
Stars: ✭ 748 (+2892%)
Mutual labels:  unsupervised-learning
spear
SPEAR: Programmatically label and build training data quickly.
Stars: ✭ 81 (+224%)
Mutual labels:  unsupervised-learning
music-recommendation-system
A simple Music Recommendation System
Stars: ✭ 38 (+52%)
Mutual labels:  unsupervised-learning
DRNET
PyTorch implementation of the NIPS 2017 paper - Unsupervised Learning of Disentangled Representations from Video
Stars: ✭ 45 (+80%)
Mutual labels:  unsupervised-learning
T-CorEx
Implementation of linear CorEx and temporal CorEx.
Stars: ✭ 31 (+24%)
Mutual labels:  unsupervised-learning
Indoor-SfMLearner
[ECCV'20] Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation
Stars: ✭ 115 (+360%)
Mutual labels:  unsupervised-learning
machine-learning
Programming Assignments and Lectures for Andrew Ng's "Machine Learning" Coursera course
Stars: ✭ 83 (+232%)
Mutual labels:  unsupervised-learning
LinearCorex
Fast, linear version of CorEx for covariance estimation, dimensionality reduction, and subspace clustering with very under-sampled, high-dimensional data
Stars: ✭ 39 (+56%)
Mutual labels:  unsupervised-learning
dads
Code for 'Dynamics-Aware Unsupervised Discovery of Skills' (DADS). Enables skill discovery without supervision, which can be combined with model-based control.
Stars: ✭ 138 (+452%)
Mutual labels:  unsupervised-learning
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (+56%)
Mutual labels:  unsupervised-learning
catgan pytorch
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
Stars: ✭ 50 (+100%)
Mutual labels:  unsupervised-learning

Bayesian Subspace Multinomial Model (BaySMM)

  • Model for learning document embeddings (i-vectors) along with their uncertainties.
  • Gaussian linear classifier exploiting the uncertainties in document embeddings.
  • See paper http://arxiv.org/abs/1908.07599

S. Kesiraju, O. Plchot, L. Burget and S. V. Gangashetty, "Learning Document Embeddings Along With Their Uncertainties," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2319-2332, 2020, doi: 10.1109/TASLP.2020.3012062.

Requirements

  • Python >= 3.7

  • PyTorch >= 1.1 <=1.4

  • scipy >= 1.3

  • numpy >= 1.16.4

  • scikit-learn >= 0.21.2

  • h5py >= 2.9.0

  • See INSTALL.md for detailed instructions.

Data preparation - sample from 20Newsgroups

python src/create_sample_data.py.py sample_data/

Training the model

  • For help:

    python src/run_baysmm.py --help

  • To train on GPU set CUDA_VISIBLE_DEVICES=$GPU_ID where the $GPU_ID is the free GPU index

  • Following code trains the model for 1000 VB iterations and saves the model in an automatically created sub-directory: exp/s_1.00_rp_1_lw_1e+01_l1_1e-03_50_adam/

    python src/run_baysmm.py train \
        sample_data/train.mtx \
        sample_data/vocab \
        exp/ \
        -K 50 \
        -trn 1000 \
        -lw 1e+01 \
        -var_p 1e+01 \
        -lt 1e-03
  • ELBO and KLD for every iteration, log file, etc are saved in the sub-directory.

Extracting the posterior distributions of embeddings

  • Extract embeddings [mean, log.std.dev] for 1000 iterations for each of the stats file present in sample_data/mtx.flist file list.

  • Using -nth 100 argument, embeddings for every 100th iteration are also saved.

    python src/run_baysmm.py extract \
        sample_data/mtx.flist \
        exp/s_1.00_rp_1_lw_1e+01_l1_1e-03_50_adam/model_T1000.h5 \
        -xtr 1000 \
        -nth 100
  • Extracted embedding posterior distributions are saved in exp/*/ivecs/ sub-directory with appropriate names.

Training and testing the classifier

  • Three classifiers can be trained on these embeddings.
  • Use --final option to train and test classifier on embeddings from the final iteration.
  1. Gaussian linear classifier - uses only the mean parameter

    python src/train_and_clf_cv.py exp/s_1.00_rp_1_lw_1e+01_l1_1e-03_50_adam/ivecs/train_model_T1000_e1000.h5 sample_data/train.labels glc

  2. Multi-class logistic regression - uses only the mean parameter

    python src/train_and_clf_cv.py exp/s_1.00_rp_1_lw_1e+01_l1_1e-03_50_adam/ivecs/train_model_T1000_e1000.h5 sample_data/train.labels lr

  3. Gaussian linear classifier with uncertainty - uses full posterior distribution

    python src/train_and_clf_cv.py exp/s_1.00_rp_1_lw_1e+01_l1_1e-03_50_adam/ivecs/train_model_T1000_e1000.h5 sample_data/train.labels glcu

  • All the results and predicted classes are saved in exp/*/results/

Citation

@ARTICLE{Kesiraju:2020:BaySMM,
  author={Kesiraju, Santosh and Plchot, Oldřich and Burget, Lukáš and Gangashetty, Suryakanth V.},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Learning Document Embeddings Along With Their Uncertainties}, 
  year={2020},
  volume={28},
  number={},
  pages={2319-2332},
  doi={10.1109/TASLP.2020.3012062}}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].