Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → DongyaoZhu → Vq Vae Wavenet

DongyaoZhu / Vq Vae Wavenet

TensorFlow implementation of VQ-VAE with WaveNet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810

Programming Languages

python

139335 projects - #7 most used programming language

Labels

tensorflow wavenet

Projects that are alternatives of or similar to Vq Vae Wavenet

wavenet-like-vocoder

Basic wavenet and fftnet vocoder model.

Stars: ✭ 20 (-50%)

Mutual labels: wavenet

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+295%)

Mutual labels: wavenet

Flowavenet

A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

Stars: ✭ 471 (+1077.5%)

Mutual labels: wavenet

wavenet

Audio source separation (mixture to vocal) using the Wavenet

Stars: ✭ 20 (-50%)

Mutual labels: wavenet

chainer-ClariNet

A Chainer implementation of ClariNet.

Stars: ✭ 45 (+12.5%)

Mutual labels: wavenet

Pytorchwavenetvocoder

WaveNet-Vocoder implementation with pytorch.

Stars: ✭ 269 (+572.5%)

Mutual labels: wavenet

Seriesnet

Time series prediction using dilated causal convolutional neural nets (temporal CNN)

Stars: ✭ 185 (+362.5%)

Mutual labels: wavenet

Wavenet Stt

An end-to-end speech recognition system with Wavenet. Built using C++ and python.

Stars: ✭ 18 (-55%)

Mutual labels: wavenet

chainer-Fast-WaveNet

A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).

Stars: ✭ 33 (-17.5%)

Mutual labels: wavenet

Pycadl

Python package with source code from the course "Creative Applications of Deep Learning w/ TensorFlow"

Stars: ✭ 356 (+790%)

Mutual labels: wavenet

Music-Style-Transfer

Source code for "Transferring the Style of Homophonic Music Using Recurrent Neural Networks and Autoregressive Model"

Stars: ✭ 16 (-60%)

Mutual labels: wavenet

constant-memory-waveglow

PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Stars: ✭ 36 (-10%)

Mutual labels: wavenet

Clarinet

A Pytorch Implementation of ClariNet

Stars: ✭ 273 (+582.5%)

Mutual labels: wavenet

birdsong-generation-project

Generating birdsong with WaveNet

Stars: ✭ 26 (-35%)

Mutual labels: wavenet

Speech Denoising Wavenet

A neural network for end-to-end speech denoising

Stars: ✭ 516 (+1190%)

Mutual labels: wavenet

Vq Vae Speech

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Stars: ✭ 187 (+367.5%)

Mutual labels: wavenet

hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Stars: ✭ 88 (+120%)

Mutual labels: wavenet

Pytorch Uniwavenet

Stars: ✭ 30 (-25%)

Mutual labels: wavenet

Parallelwavegan

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

Stars: ✭ 682 (+1605%)

Mutual labels: wavenet

Time Series Prediction

A collection of time series prediction methods: rnn, seq2seq, cnn, wavenet, transformer, unet, n-beats, gan, kalman-filter

Stars: ✭ 351 (+777.5%)

Mutual labels: wavenet

View All Similar Projects ➔

VQ-VAE-WaveNet

This is a TensorFlow implementation of vqvae with wavenet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810.

Dependencies:

TensorFlow r1.12 / r1.14, numpy, librosa, scipy, tqdm

Results

The folder results contains some reconstructed audio. Speaker conversion works well, but encoder (local condition) needs some more tuning.

Model

Encoder

There are 3 encoders implemented:

64 6 layers strided conv, as mentioned in original paper (default)
Magenta encoder from nsynth-magenta, wavenet alike
2019 the one described in https://arxiv.org/abs/1901.08810

Parameters can be found in Encoder/encoder.py and model_parameters.json.

VQ

There are 2 ways to train the embedding:

train $z_e$ and $e_k$ separately, as described in original paper (default)
train them together without tf.stop_gradient

Initialising the embedding:

uniform scaling (default)
random normal init

This could be turned off as well, in which case an AE is trained.

Parameters can be found in model_parameters.json.

Decoder

WaveNet decoder.

Parameters can be found in wavenet_parameters.json.

Training

Dataset

Supports VCTK (default) and LibriSpeech. Download data and put the unzipped folders 'VCTK-Corpus' or 'LibriSpeech' in the folder data. To train from custom datasets, refer to dataset.py for making iterators.

example usage:

python3 train.py -dataset VCTK -length 6656 -batch 8 -step 100000 -save saved_model/weights

-dataset VCTK or LibriSpeech
-length length of segment to use in training, must be multiples of largest dilation rate, recommended 320ms
-batch batch size
-step number of steps to train
-save save to (e.g. saved_model/weights)
-restore resume from pretrained model (e.g. saved_model/weights-110640)
-interval steps between each log written to disk

Generation

Implements fast generation; starts from zeros.

example usage: python3 generate.py -restore saved_model/weights-110640 -audio data/VCTK-Corpus/wav48/p225/p225_001.wav -speakers p225 p226 p227 p228 -mode sample

-restore where to restore trained model and save embedding & generated audio
-audio which audio to use as local condition
-speakers which speaker(s) to use as global condition, must be consistent with training data
-mode method to sample from predicted quantised distribution (sample, greedy)

Visualisation

For now it saves the trained vq embedding space, and visualises through http://projector.tensorflow.org

example usage: python3 visualise.py -embedding embedding_110640.npy -speaker speaker_embedding_110640.npy -save embeddings then upload tsv files in folder embeddings to the website.

Note that the speaker embedding separated gender almost perfectly (upload the vec and meta files to http://projector.tensorflow.org, then search for #f# or #m#). Also q(z|x) did slowly converge to the assumed uniform prior distribution.

Micellaneous

Stuff I've tried:

At each frame of encoder output, instead of predicting a vector and find nearest neighbour and use the index as a one-hot categorical distribution, I make the last encoder channel = k, then apply a softmax so it represents a k-way softmax distribution, whose KL-divergence with a uniform prior is the same as a cross entropy loss. Add this loss in addition to the original 3 losses.
First train without decoder, then freeze embedding & encoder and train decoder. This made the vq embedding space more diverse than training the whole model altogether.

TODO

[ ] Train a prior based on vq

Alternative Implementation

The folder Magenta contains an implementation that I collaged from 'official' code. High coupling. My own implementation draws insights from there. Training and Generating are pretty similar.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 40

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗