Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → bfs18 → Nsynth_wavenet

bfs18 / Nsynth_wavenet

parallel wavenet based on nsynth

Programming Languages

python

139335 projects - #7 most used programming language

Labels

wavenet

Projects that are alternatives of or similar to Nsynth wavenet

chainer-ClariNet

A Chainer implementation of ClariNet.

Stars: ✭ 45 (-55%)

Mutual labels: wavenet

Flowavenet

A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"

Stars: ✭ 471 (+371%)

Mutual labels: wavenet

Tacotron2

pytorch tacotron2 https://arxiv.org/pdf/1712.05884.pdf

Stars: ✭ 46 (-54%)

Mutual labels: wavenet

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+58%)

Mutual labels: wavenet

Time Series Prediction

A collection of time series prediction methods: rnn, seq2seq, cnn, wavenet, transformer, unet, n-beats, gan, kalman-filter

Stars: ✭ 351 (+251%)

Mutual labels: wavenet

Parallelwavegan

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

Stars: ✭ 682 (+582%)

Mutual labels: wavenet

QPPWG

Quasi-Periodic Parallel WaveGAN Pytorch implementation

Stars: ✭ 41 (-59%)

Mutual labels: wavenet

Chainer Vq Vae

A Chainer implementation of VQ-VAE.

Stars: ✭ 77 (-23%)

Mutual labels: wavenet

Pycadl

Python package with source code from the course "Creative Applications of Deep Learning w/ TensorFlow"

Stars: ✭ 356 (+256%)

Mutual labels: wavenet

Vq Vae Wavenet

TensorFlow implementation of VQ-VAE with WaveNet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810

Stars: ✭ 40 (-60%)

Mutual labels: wavenet

hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Stars: ✭ 88 (-12%)

Mutual labels: wavenet

Clarinet

A Pytorch Implementation of ClariNet

Stars: ✭ 273 (+173%)

Mutual labels: wavenet

Wavenet Stt

An end-to-end speech recognition system with Wavenet. Built using C++ and python.

Stars: ✭ 18 (-82%)

Mutual labels: wavenet

chainer-Fast-WaveNet

A Chainer implementation of Fast WaveNet(mel-spectrogram vocoder).

Stars: ✭ 33 (-67%)

Mutual labels: wavenet

Wavenet

WaveNet implementation with chainer

Stars: ✭ 53 (-47%)

Mutual labels: wavenet

constant-memory-waveglow

PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Stars: ✭ 36 (-64%)

Mutual labels: wavenet

Speech Denoising Wavenet

A neural network for end-to-end speech denoising

Stars: ✭ 516 (+416%)

Mutual labels: wavenet

Wavenet Enhancement

Speech Enhancement using Bayesian WaveNet

Stars: ✭ 86 (-14%)

Mutual labels: wavenet

Tf Wavenet vocoder

Wavenet and its applications with Tensorflow

Stars: ✭ 58 (-42%)

Mutual labels: wavenet

Pytorch Uniwavenet

Stars: ✭ 30 (-70%)

Mutual labels: wavenet

View All Similar Projects ➔

Implement parallel wavenet based on nsynth.

To make the code and configuration as simple as possible, most of the extensible properties are not extended and are set to default values.

How to use the code:

Suppose a directory named WAVE_DIR contains all the wave files that are used to train a wavenet model.

Downsample the wave files if the sampling rate is not 16k Hz (only 16k Hz wave files are supported for the time being).
Librosa downsample result may be not in [-1, 1), so use tool/sox_downsample.py to downsample all waves first. The arguments are quite self-evident.

Build the tf_record data.

python3 build_dataset.py --wave_dir WAVE_DIR --save_path TFR_PATH

Train a teacher wavenet. config_jsons/wavenet_mol.json is a proper configuration file for a teacher wavenet.
```
python3 train_wavenet.py --config config_jsons/wavenet_mol.json --train_path TFR_PATH \
    --log_root WN_LOG_ROOT --total_batch_size 28 --gpu_id 0,1,2,3
```
The training script supports multiple GPUs. Just specify the gpu ids with --gpu_id to use multiple GPUs.
Either --logdir or --log_root can be used to specify the directory that saves the training log and model files. If --log_root is assigned, it will make a subdirectory named by the abbreviation of the running configuration in LOG_ROOT. --logdir is used to specify an existing log directory so an interrupted running can be continued from the saved models.

Generate waves form a trained wavenet. Suppose a trained wavenet is saved in WN_LOGDIR.

python3 eavl_wavenet.py --ckpt_dir WN_LOGDIR --source_path tests/test_data \
    --save_path tests/pred_data --gpu_id 0

Train a parallel wavenet.

python3 --config config_jsons/parallel_wavenet.json --train_paht TFR_PATH \
    --teacher_dir WN_LOGDIR --log_root PWN_LOG_ROOT --total_batch_size 28 \
    --gpu_id 0,1,2,3

Generate waves form a trained parallel wavenet. Suppose a trained parallel wavenet is saved in PWN_LOGDIR.

python3 eavl_parallel_wavenet.py --ckpt_dir PWN_LOGDIR --source_path tests/test_data \
    --save_path tests/pred_data --gpu_id 0

If multiple experiments is run on multiple servers, you may want to gather all the experiment logs and generated waves from each host. You can use run_all_eval.py script. A configuration file is needed to specify the hosts, users, passwords, exp_dirs and eval_scripts. For example:
```
all_eval.json
{
    "hosts": ["", "127.0.0.233"]
    "users": ["", "asdf"]
    "passwords": ["", "xxxx"]
    "exp_dirs": ["~/exp/logdir1", "/data/logdir2"]
    "eval_scripts": ["eval_parallel_wavenet.py", "eval_wavenet.py"]
}
```
If it is a local host, you can set hosts, users, passwords to empty strings.
```
python3 run_all_eval.py -c all_eval.json -w tests/test_data -t ~/all_test_log
```

Pre-trained models:

wavenet model: ns_wn-eval.tar.gz
Set DOUBLE_GATE_WIDTH=True in wavenet/wavenet.py when using ns_wn-eval.
Clarinet teacher model: ns_wn-gauss-eval.tar.gz
parallel wavenet model: ns_pwn-eval.tar.gz
parallel wavenet model with contrastive loss: ns_pwn-eval-2.tar.gz
Clarinet vocoder model: ns_pwn-gauss-eval.tar.gz
The pre-trained models are trained on LJSpeech dataset. The package contains the checkpoint and the confing json file.

Code status:

[OK] wavenet
[OK] fastgen for wavenet
[OK] parallel wavenet
[OK] gen for parallel wavenet

It seems that using mu law make the training easier. So experiment it first.
The following examples are more of functional test than gaining good waves. The network may be not trained enough.

tune wavenet
- [OK] use_mu_law + ce LJ001-0001 | LJ001-0002
- [OK] use_mu_law + mol LJ001-0001 | J001-0002
- [OK] no_mu_law + mol LJ001-0001 | LJ001-0002
- [OK] no_mu_law + gauss (Clarinet teacher) LJ001-0001 | LJ001-0002
tune parallel wavenet
- use_mu_law
- [OK] no_mu_law Failed case 1 | Failed case 2 | A better case | with contrastive loss | Clarinet Vocoder | Share deconv stack

The power loss defination is important, the failed case 1 use pow(abs(stft(y))) as mean square error input, the failed case 2 use log(abs(stft(y))). The are both noisy, but the noises are of different type. A better case uses abs(stft(y)), it is much clearer than the previous 2 cases. Probably this is the right choice.
I plot the refinement process of a spectrum generated by abs(stft(y)) configuration during training.

Proper initial mean_tot and scale_tot values have positive impact on model convergence and numerical stability. According to the LJSpeech data distribution, proper initial values for mean_tot and scale_tot should be 0.0 and 0.05. I modified the initializer to achieve it.

The figure is pot by this script

Decreasing loss does not indicate that everything goes well. I found a straightforward method to determine whether a parallel wavenet is running OK. Compare the values of new_x, new_x_std, new_x_abs, new_x_abs_std listed in tensorboard to statistics of real data. If there is no difference of many orders of magnitudes, the training process is moving in the right direction.

e.g. The first tensorboard figure comes from a parallel wavenet trained without power lowss. The values of new_x, new_x_abs are too large compared to real data. So I cannot get meaningful waves from this model. The second is a model using power loss. Its values are much closer to real data. And it is generating very noisy but to some extent meaningful waves.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 100

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗