All Projects → rampage644 → Wavenet

rampage644 / Wavenet

Licence: apache-2.0
WaveNet implementation with chainer

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Wavenet

Chainer Vq Vae
A Chainer implementation of VQ-VAE.
Stars: ✭ 77 (+45.28%)
Mutual labels:  chainer, wavenet
chainer-ClariNet
A Chainer implementation of ClariNet.
Stars: ✭ 45 (-15.09%)
Mutual labels:  chainer, wavenet
chainer-Fast-WaveNet
A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).
Stars: ✭ 33 (-37.74%)
Mutual labels:  chainer, wavenet
Flowavenet
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
Stars: ✭ 471 (+788.68%)
Mutual labels:  wavenet
See
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
Stars: ✭ 545 (+928.3%)
Mutual labels:  chainer
Wavenet Stt
An end-to-end speech recognition system with Wavenet. Built using C++ and python.
Stars: ✭ 18 (-66.04%)
Mutual labels:  wavenet
Chainer Segnet
SegNet implementation & experiments in Chainer
Stars: ✭ 42 (-20.75%)
Mutual labels:  chainer
Capsnet
CapsNet (Capsules Net) in Geoffrey E Hinton paper "Dynamic Routing Between Capsules" - State Of the Art
Stars: ✭ 423 (+698.11%)
Mutual labels:  chainer
Pytorch Uniwavenet
Stars: ✭ 30 (-43.4%)
Mutual labels:  wavenet
Deepo
Setup and customize deep learning environment in seconds.
Stars: ✭ 6,145 (+11494.34%)
Mutual labels:  chainer
Deeplearningmugenknock
でぃーぷらーにんぐを無限にやってディープラーニングでDeepLearningするための実装CheatSheet
Stars: ✭ 684 (+1190.57%)
Mutual labels:  chainer
Deep Learning Project Template
A best practice for deep learning project template architecture.
Stars: ✭ 641 (+1109.43%)
Mutual labels:  chainer
Chainer Rnn Ner
Named Entity Recognition with RNN, implemented by Chainer
Stars: ✭ 19 (-64.15%)
Mutual labels:  chainer
Speech Denoising Wavenet
A neural network for end-to-end speech denoising
Stars: ✭ 516 (+873.58%)
Mutual labels:  wavenet
Chainer Trt
Chainer x TensorRT
Stars: ✭ 32 (-39.62%)
Mutual labels:  chainer
Chainer Chemistry
Chainer Chemistry: A Library for Deep Learning in Biology and Chemistry
Stars: ✭ 462 (+771.7%)
Mutual labels:  chainer
Resnet Multipleframework
ResNet benchmark by Keras(TensorFlow), Keras(MXNet), Chainer, PyTorch using Google Colab
Stars: ✭ 14 (-73.58%)
Mutual labels:  chainer
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+1186.79%)
Mutual labels:  wavenet
Test Tube
Python library to easily log experiments and parallelize hyperparameter search for neural networks
Stars: ✭ 663 (+1150.94%)
Mutual labels:  chainer
Machine Learning Curriculum
💻 Make machines learn so that you don't have to struggle to program them; The ultimate list
Stars: ✭ 761 (+1335.85%)
Mutual labels:  chainer

Description

WaveNet replication study. Before stepping up to WaveNet implementation it was decided to implement PixelCNN first as WaveNet based on its architecture.

This repository contains two modes: Gated PixelCNN and WaveNet, see class definitions in wavenet/models.py.

For detailed explanation of how these model work see my blog post.

Gated PixelCNN

$ python3 train.py --help
usage: train.py [-h] [--batchsize BATCHSIZE] [--epoch EPOCH] [--gpu GPU]
                [--resume RESUME] [--out OUT] [--hidden_dim HIDDEN_DIM]
                [--out_hidden_dim OUT_HIDDEN_DIM] [--blocks_num BLOCKS_NUM]
                [--gradclip GRADCLIP] [--learning_rate LEARNING_RATE]
                [--levels LEVELS] [--dataset DATASET] [--stats STATS]

PixelCNN

optional arguments:
  -h, --help            show this help message and exit
  --batchsize BATCHSIZE, -b BATCHSIZE
                        Number of images in each mini-batch
  --epoch EPOCH, -e EPOCH
                        Number of sweeps over the dataset to train
  --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
  --resume RESUME, -r RESUME
                        Resume the training from snapshot
  --out OUT, -o OUT     Output directory
  --hidden_dim HIDDEN_DIM, -d HIDDEN_DIM
                        Number of hidden dimensions
  --out_hidden_dim OUT_HIDDEN_DIM
                        Number of hidden dimensions
  --blocks_num BLOCKS_NUM, -n BLOCKS_NUM
                        Number of layers
  --gradclip GRADCLIP   Bound for gradient hard clipping
  --learning_rate LEARNING_RATE
                        Bound for gradient hard clipping
  --levels LEVELS       Level number to quantisize pixel values
  --dataset DATASET     Dataset for training. Either mnist or cifar.
  --stats STATS         Collect layerwise statistics

Command to train model on GPU with MNIST dataset (will be downloaded automatically):

python train.py -g0 --levels 256 --out data/

To train with CIFAR-10 dataset use --dataset switch:

python train.py -g0 --levels 256 --out data/ --dataset cifar

To save training time simplifying architecture is useful:

  • Reduce number of blocks (--blocks_num 4)
  • Reduce hidden dimensionality (--hidden_dim 32)
  • Reduce output softmax cardinality (--levels 16)

Once you have model trained you can generate samples.

python3 infer.py --help
usage: infer.py [-h] [--gpu GPU] [--model MODEL] [--hidden_dim HIDDEN_DIM]
                [--out_hidden_dim OUT_HIDDEN_DIM] [--blocks_num BLOCKS_NUM]
                [--levels LEVELS] [--output OUTPUT] [--label LABEL]
                [--count COUNT] [--height HEIGHT] [--width WIDTH]

PixelCNN

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
  --model MODEL, -m MODEL
                        Path to model for generation
  --hidden_dim HIDDEN_DIM, -d HIDDEN_DIM
                        Number of hidden dimensions
  --out_hidden_dim OUT_HIDDEN_DIM
                        Number of hidden dimensions
  --blocks_num BLOCKS_NUM, -n BLOCKS_NUM
                        Number of layers
  --levels LEVELS       Level number to quantisize pixel values
  --output OUTPUT, -o OUTPUT
                        Output filename
  --label LABEL, -l LABEL
                        Class label to generate
  --count COUNT, -c COUNT
                        Number of images to generate (woulld be squared: so
                        for 10 it would generate 100)
  --height HEIGHT       Output image height
  --width WIDTH         Output image width

Command for samples generation (you should specify exactly the same architecture for generation as you used for training otherwise you'd get weird results):

python infer.py -g0 --levels 256 -m data/pixecnn_XXXXX --output samples.jpg

WaveNet

WaveNet model is still in 'work in progress' state, some minor changes could happen. Also, it wasn't trained end-to-end on any dataset yet (only very small ones).

WaveNet expects input data to be preprocessed with preprocess.py.

usage: preprocess.py [-h] [--data DATA] [--output OUTPUT] [--workers WORKERS]
                     [--rate RATE] [--stacks_num STACKS_NUM]
                     [--layers_num LAYERS_NUM] [--target_length TARGET_LENGTH]
                     [--flush_every FLUSH_EVERY]

optional arguments:
  -h, --help            show this help message and exit
  --data DATA
  --output OUTPUT
  --workers WORKERS
  --rate RATE
  --stacks_num STACKS_NUM
  --layers_num LAYERS_NUM
  --target_length TARGET_LENGTH
  --flush_every FLUSH_EVERY

You specify path to your wav files and it recursively searches the path, subsamples it and split into chunks. Note that you need to specify number of stacks and number of layers per stack in order to calculate receptive field size.

Example of data preprocessing step:

python preprocess.py --data vctk/wav/p225 --rate 16000 --stacks_num 4 --layers_num 10

It will generate several files named vctk_* (names are hard-coded) that are expected by WaveNet model data loader.

$ python3 train_wavenet.py --help
usage: train_wavenet.py [-h] [--batchsize BATCHSIZE] [--epoch EPOCH]
                        [--gpu GPU] [--resume RESUME] [--out OUT]
                        [--data DATA] [--hidden_dim HIDDEN_DIM]
                        [--out_hidden_dim OUT_HIDDEN_DIM]
                        [--stacks_num STACKS_NUM] [--layers_num LAYERS_NUM]
                        [--learning_rate LEARNING_RATE] [--clip CLIP]
                        [--weight_decay WEIGHT_DECAY] [--levels LEVELS]
                        [--stats]

PixelCNN

optional arguments:
  -h, --help            show this help message and exit
  --batchsize BATCHSIZE, -b BATCHSIZE
                        Number of images in each mini-batch
  --epoch EPOCH, -e EPOCH
                        Number of sweeps over the dataset to train
  --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
  --resume RESUME, -r RESUME
                        Resume the training from snapshot
  --out OUT, -o OUT     Output directory
  --data DATA, -d DATA  Input data directory
  --hidden_dim HIDDEN_DIM
                        Number of hidden dimensions
  --out_hidden_dim OUT_HIDDEN_DIM
                        Number of hidden dimensions
  --stacks_num STACKS_NUM, -s STACKS_NUM
                        Number of stacks
  --layers_num LAYERS_NUM, -l LAYERS_NUM
                        Number of layers per stack
  --learning_rate LEARNING_RATE
                        Learning rate
  --clip CLIP           L2 norm gradient clipping
  --weight_decay WEIGHT_DECAY
                        Weight decay rate (L2 regularization)
  --levels LEVELS       Level number to quantisize values
  --stats               Collect layerwise statistics

Command for model training:

python train_wavenet.py -g0 --out data/ --stacks_num 4 --layers_num 10
$ python3 infer_wavenet.py --help
usage: infer_wavenet.py [-h] [--gpu GPU] [--model MODEL]
                        [--hidden_dim HIDDEN_DIM]
                        [--out_hidden_dim OUT_HIDDEN_DIM]
                        [--stacks_num STACKS_NUM] [--layers_num LAYERS_NUM]
                        [--levels LEVELS] [--output OUTPUT] [--label LABEL]
                        [--count COUNT] [--rate RATE] [--length LENGTH]

PixelCNN

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU, -g GPU     GPU ID (negative value indicates CPU)
  --model MODEL, -m MODEL
                        Path to model for generation
  --hidden_dim HIDDEN_DIM
                        Number of hidden dimensions
  --out_hidden_dim OUT_HIDDEN_DIM
                        Number of hidden dimensions
  --stacks_num STACKS_NUM, -s STACKS_NUM
                        Number of stacks
  --layers_num LAYERS_NUM, -l LAYERS_NUM
                        Number of layers per stack
  --levels LEVELS       Level number to quantisize pixel values
  --output OUTPUT, -o OUTPUT
                        Output sample directory
  --label LABEL         Class label to generate
  --count COUNT, -c COUNT
                        Number of samples to generate
  --rate RATE           Samples rate
  --length LENGTH       Output sample length

After model had been trained samples could be generated using:

python infer_wavenet.py -g0 --stacks_num 4 --layers_num 10 -m data/wavenet_XXXX --output samples/

To speed up training and generation process one could simplify architecture:

  • Reduce number of stacks (reduces receptive field size)
  • Reduce number of layers per stack (also reduces receptive field size)
  • Reduce sampling rate (i.e. set it to 4000 or 8000)
  • Reduce hidden layers cardinality

Some results

Either model wasn't trained long enough to produce good-looking-to-human results. However, here are results for simplified settings.

PixelCNN 8-way, MNIST

8way mnist

PixelCNN 2-way, MNIST

2way mnist

PixelCNN, CIFAR

CIFAR

Gated PixelCNN, 4-way, 5 blocks, label 1

Label 1

Gated PixelCNN, 4-way, 5 blocks, label 7

Label 7

Gated PixelCNN, 256-way, 8 blocks, label 8, 100k iterations

label 8, 100k

Gated PixelCNN, 256-way, 8 blocks, label 8, 500k iterations

label 8, 100k

WaveNet, overfit on 500Hz tone

Download

WaveNet, overfit on VCTK speaker id 225, 4 stacks, 24 hour training

Download

Links

  1. Website
  2. WaveNet
  3. PixelRNN
  4. Conditional PixelCNN
  5. PixelCNN++ repo
  6. PixelCNN++ paper

Other implementations

  1. tensorflow
  2. chainer
  3. keras #1
  4. keras #2

Other resources

  1. Fast wavenet
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].