All Projects → iamyuanchung → Autoregressive Predictive Coding

iamyuanchung / Autoregressive Predictive Coding

Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Autoregressive Predictive Coding

srVAE
VAE with RealNVP prior and Super-Resolution VAE in PyTorch. Code release for https://arxiv.org/abs/2006.05218.
Stars: ✭ 56 (-59.42%)
Mutual labels:  representation-learning, unsupervised-learning
Awesome Vaes
A curated list of awesome work on VAEs, disentanglement, representation learning, and generative models.
Stars: ✭ 418 (+202.9%)
Mutual labels:  unsupervised-learning, representation-learning
Simclr
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
Stars: ✭ 293 (+112.32%)
Mutual labels:  unsupervised-learning, representation-learning
SimCLR
Pytorch implementation of "A Simple Framework for Contrastive Learning of Visual Representations"
Stars: ✭ 65 (-52.9%)
Mutual labels:  representation-learning, unsupervised-learning
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-86.96%)
Mutual labels:  unsupervised-learning, representation-learning
awesome-contrastive-self-supervised-learning
A comprehensive list of awesome contrastive self-supervised learning papers.
Stars: ✭ 748 (+442.03%)
Mutual labels:  representation-learning, unsupervised-learning
Disentangling Vae
Experiments for understanding disentanglement in VAE latent representations
Stars: ✭ 398 (+188.41%)
Mutual labels:  unsupervised-learning, representation-learning
amr
Official adversarial mixup resynthesis repository
Stars: ✭ 31 (-77.54%)
Mutual labels:  representation-learning, unsupervised-learning
Simclr
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Stars: ✭ 750 (+443.48%)
Mutual labels:  unsupervised-learning, representation-learning
Unsupervised Classification
SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.
Stars: ✭ 605 (+338.41%)
Mutual labels:  unsupervised-learning, representation-learning
proto
Proto-RL: Reinforcement Learning with Prototypical Representations
Stars: ✭ 67 (-51.45%)
Mutual labels:  representation-learning, unsupervised-learning
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-47.1%)
Mutual labels:  unsupervised-learning, representation-learning
State-Representation-Learning-An-Overview
Simplified version of "State Representation Learning for Control: An Overview" bibliography
Stars: ✭ 32 (-76.81%)
Mutual labels:  representation-learning, unsupervised-learning
ladder-vae-pytorch
Ladder Variational Autoencoders (LVAE) in PyTorch
Stars: ✭ 59 (-57.25%)
Mutual labels:  representation-learning, unsupervised-learning
rl singing voice
Unsupervised Representation Learning for Singing Voice Separation
Stars: ✭ 18 (-86.96%)
Mutual labels:  representation-learning, unsupervised-learning
Contrastive Predictive Coding
Keras implementation of Representation Learning with Contrastive Predictive Coding
Stars: ✭ 369 (+167.39%)
Mutual labels:  unsupervised-learning, representation-learning
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (-13.77%)
Mutual labels:  representation-learning, unsupervised-learning
awesome-graph-self-supervised-learning
Awesome Graph Self-Supervised Learning
Stars: ✭ 805 (+483.33%)
Mutual labels:  representation-learning, unsupervised-learning
Lemniscate.pytorch
Unsupervised Feature Learning via Non-parametric Instance Discrimination
Stars: ✭ 532 (+285.51%)
Mutual labels:  unsupervised-learning, representation-learning
Transferlearning
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
Stars: ✭ 8,481 (+6045.65%)
Mutual labels:  representation-learning, unsupervised-learning

Autoregressive Predictive Coding

This repository contains the official implementation (in PyTorch) of Autoregressive Predictive Coding (APC) proposed in An Unsupervised Autoregressive Model for Speech Representation Learning.

APC is a speech feature extractor trained on a large amount of unlabeled data. With an unsupervised, autoregressive training objective, representations learned by APC not only capture general acoustic characteristics such as speaker and phone information from the speech signals, but are also highly accessible to downstream models--our experimental results on phone classification show that a linear classifier taking the APC representations as the input features significantly outperforms a multi-layer percepron using the surface features.

Dependencies

  • Python 3.5
  • PyTorch 1.0

Dataset

In the paper, we used the train-clean-360 split from the LibriSpeech corpus for training the APC models, and the dev-clean split for keeping track of the training loss. We used the log Mel spectrograms, which were generated by running the Kaldi scripts, as the input acoustic features to the APC models. Of course you can generate the log Mel spectrograms yourself, but to help you better reproduce our results, here we provide the links to the data proprocessed by us that can be directly fed to the APC models. We also include other data splits that we did not use in the paper for you to explore, e.g., you can try training an APC model on a larger and nosier set (e.g., train-other-500) and see if it learns more robust speech representations.

Training APC

Below we will follow the paper and use train-clean-360 and dev-clean as demonstration. Once you have downloaded the data, unzip them by running:

xz -d train-clean-360.xz
xz -d dev-clean.xz

Then, create a directory librispeech_data/kaldi and move the data into it:

mkdir -p librispeech_data/kaldi
mv train-clean-360-hires-norm.blogmel librispeech_data/kaldi
mv dev-clean-hires-norm.blogmel librispeech_data/kaldi

Now we will have to transform the data into the format loadable by the PyTorch DataLoader. To do so, simply run:

# Prepare the training set
python prepare_data.py --librispeech_from_kaldi librispeech_data/kaldi/train-clean-360-hires-norm.blogmel --save_dir librispeech_data/preprocessed/train-clean-360-hires-norm.blogmel
# Prepare the valication set
python prepare_data.py --librispeech_from_kaldi librispeech_data/kaldi/dev-clean-hires-norm.blogmel --save_dir librispeech_data/preprocessed/dev-clean-hires-norm-blogmel

Once the program is done, you will see a directory preprocessed/ inside librispeech_data/ that contains all the preprocessed PyTorch tensors.

To train an APC model, simply run:

python train_apc.py

By default, the trained models will be put in logs/. You can also use Tensorboard to trace the training progress. There are many other configurations you can try, check train_apc.py for more details--it is highly documented and should be self-explanatory.

Feature extraction

Once you have trained your APC model, you can use it to extract speech features from your target dataset. To do so, feed-forward the trained model on the target dataset and retrieve the extracted features by running:

_, feats = model.forward(inputs, lengths)

feats is a PyTorch tensor of shape (num_layers, batch_size, seq_len, rnn_hidden_size) where:

  • num_layers is the RNN depth of your APC model
  • batch_size is your inference batch size
  • seq_len is the maximum sequence length and is determined when you run prepare_data.py. By default this value is 1600.
  • rnn_hidden_size is the dimensionality of the RNN hidden unit.

As you can see, feats is essentially the RNN hidden states in an APC model. You can think of APC as a speech version of ELMo if you are familiar with it.

There are many ways to incorporate feats into your downstream task. One of the easiest way is to take only the outputs of the last RNN layer (i.e., feats[-1, :, :, :]) as the input features to your downstream model, which is what we did in our paper. Feel free to explore other mechanisms.

Pre-trained models

We release the pre-trained models that were used to produce the numbers reported in the paper. load_pretrained_model.py provides a simple example of loading a pre-trained model.

Reference

Please cite our paper(s) if you find this repository useful. This first paper proposes the APC objective, while the second paper applies it to speech recognition, speech translation, and speaker identification, and provides more systematic analysis on the learned representations. Cite both if you are kind enough!

@inproceedings{chung2019unsupervised,
  title = {An unsupervised autoregressive model for speech representation learning},
  author = {Chung, Yu-An and Hsu, Wei-Ning and Tang, Hao and Glass, James},
  booktitle = {Interspeech},
  year = {2019}
}
@inproceedings{chung2020generative,
  title = {Generative pre-training for speech with autoregressive predictive coding},
  author = {Chung, Yu-An and Glass, James},
  booktitle = {ICASSP},
  year = {2020}
}

Contact

Feel free to shoot me an email for any inquiries about the paper and this repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].