All Projects → tomlepaine → Fast Wavenet

tomlepaine / Fast Wavenet

Licence: gpl-3.0
Speedy Wavenet generation using dynamic programming ⚡

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to Fast Wavenet

hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (-94.84%)
Mutual labels:  wavenet
Wavenet Stt
An end-to-end speech recognition system with Wavenet. Built using C++ and python.
Stars: ✭ 18 (-98.94%)
Mutual labels:  wavenet
Chainer Vq Vae
A Chainer implementation of VQ-VAE.
Stars: ✭ 77 (-95.48%)
Mutual labels:  wavenet
Clarinet
A Pytorch Implementation of ClariNet
Stars: ✭ 273 (-83.99%)
Mutual labels:  wavenet
Speech Denoising Wavenet
A neural network for end-to-end speech denoising
Stars: ✭ 516 (-69.74%)
Mutual labels:  wavenet
Vq Vae Wavenet
TensorFlow implementation of VQ-VAE with WaveNet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810
Stars: ✭ 40 (-97.65%)
Mutual labels:  wavenet
chainer-Fast-WaveNet
A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).
Stars: ✭ 33 (-98.06%)
Mutual labels:  wavenet
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+551.03%)
Mutual labels:  wavenet
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (-60%)
Mutual labels:  wavenet
Tf Wavenet vocoder
Wavenet and its applications with Tensorflow
Stars: ✭ 58 (-96.6%)
Mutual labels:  wavenet
Time Series Prediction
A collection of time series prediction methods: rnn, seq2seq, cnn, wavenet, transformer, unet, n-beats, gan, kalman-filter
Stars: ✭ 351 (-79.41%)
Mutual labels:  wavenet
Flowavenet
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Stars: ✭ 471 (-72.38%)
Mutual labels:  wavenet
Tacotron2
pytorch tacotron2 https://arxiv.org/pdf/1712.05884.pdf
Stars: ✭ 46 (-97.3%)
Mutual labels:  wavenet
Pytorchwavenetvocoder
WaveNet-Vocoder implementation with pytorch.
Stars: ✭ 269 (-84.22%)
Mutual labels:  wavenet
Wavenet Enhancement
Speech Enhancement using Bayesian WaveNet
Stars: ✭ 86 (-94.96%)
Mutual labels:  wavenet
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (-90.73%)
Mutual labels:  wavenet
Pytorch Uniwavenet
Stars: ✭ 30 (-98.24%)
Mutual labels:  wavenet
Pytorch Gan Timeseries
GANs for time series generation in pytorch
Stars: ✭ 109 (-93.61%)
Mutual labels:  wavenet
Nsynth wavenet
parallel wavenet based on nsynth
Stars: ✭ 100 (-94.13%)
Mutual labels:  wavenet
Wavenet
WaveNet implementation with chainer
Stars: ✭ 53 (-96.89%)
Mutual labels:  wavenet

Fast Wavenet: An efficient Wavenet generation implementation

Our implementation speeds up Wavenet generation by eliminating redundant convolution operations. A naive implementation of Wavenet generation is O(2^L), while ours is O(L), where L is the number of layers.

While other Wavenet repos focus on training on a large corpus of data and generating samples, ours mostly describes an efficient generation algorithm (which is super simple), and provides an example implementation.

We note that while not explicitly stated in the Wavenet paper, we talked to the authors, and they are doing something similar.

For more about Wavenet, see DeepMind's blog post.

If you find this work useful and use it on your own research, please cite our paper.

@article{paine2016fast,
  title={Fast Wavenet Generation Algorithm},
  author={Paine, Tom Le and Khorrami, Pooya and Chang, Shiyu and Zhang, Yang and Ramachandran, Prajit and Hasegawa-Johnson, Mark A and Huang, Thomas S},
  journal={arXiv preprint arXiv:1611.09482},
  year={2016}
}

Timing experiments

We generated from a model with 2 blocks of L layers each, using a naive implementation and ours. Results are averaged over 100 repeats. When L is small the naive implementation performs better than expected due to GPU parallelization of the convolution operations. But when L is large, the efficient implementation really shines.

General purpose

While this algorithm works well for auto-regressive models like Wavenet, it would be useful anytime you need to run a causal dilated convolutional neural network fast. For instance: classification/regression on streaming data.

Authors (helped via pair coding)

Acknowledgements (helped via conversation)


Algorithm motivation

Here we focus on generation for dilated causal convolution networks (E.g. Wavenet) with filter size 2. Though these ideas generalize to larger filter sizes.

During generation, the computational graph used to compute a single output value can be seen as a binary tree.

The input nodes (blue) are the leaves of the tree, and the output is the root. The intermediate computations are the red nodes. The edges of the graph correspond to matrix multiplies. Since the computation is a binary tree, the overall computation time for graph is O(2^L). When L is large, this is horrible.

However, since this model is being applied repeatedly over time, there is a lot of redundant computation, which we can cache to increase the speed of generating a single sample.

The key insight is this: given certain nodes in the graph, we have all the information we need to compute the current output. We call these nodes the recurrent states in analogy to RNNs. These nodes have already been computed, all we need to do is cache them.

Note that at the next time point, we will need a different subset of recurrent states. As a result we need to cache several recurrent states per layer. The number we need to keep is equal to the dilation of that layer.

Algorithm components

The algorithm has two components:

  • Generation Model
  • Convolution Queues

The generation model can be viewed as 1 step of a recurrent neural network. It takes as input: the current observation and several recurrent states, and computes: the output prediction and new recurrent states.

The convolution queues store the new recurrent states which have been computed by the layer below.

Algorithm

Initialize the generation model with the weights from the pre-trained convolutional network. For each layer, initialize the convolution queue. The max length of the queue equals the dilation of that layer. Fill it with zero recurrent states. (See above).

Repeat:

  • Pop phase
  • Push phase

Pop phase: For each convolution queue pop the recurrent state and feed it to the corresponding state of the generation model, then calculate the new hidden states and output.

Push phase: For each new hidden state push it into the convolution queue of the layer above.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].