All Projects → auspicious3000 → Wavenet Enhancement

auspicious3000 / Wavenet Enhancement

Speech Enhancement using Bayesian WaveNet

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Wavenet Enhancement

ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+83.72%)
Mutual labels:  speech, wavenet
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+2139.53%)
Mutual labels:  speech, wavenet
hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (+2.33%)
Mutual labels:  speech, wavenet
Vq Vae Speech
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Stars: ✭ 187 (+117.44%)
Mutual labels:  speech, wavenet
Speech Denoising Wavenet
A neural network for end-to-end speech denoising
Stars: ✭ 516 (+500%)
Mutual labels:  speech, wavenet
Dialectid e2e
End to End Dialect Identification using Convolutional Neural Network
Stars: ✭ 40 (-53.49%)
Mutual labels:  speech
Tf Wavenet vocoder
Wavenet and its applications with Tensorflow
Stars: ✭ 58 (-32.56%)
Mutual labels:  wavenet
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (-59.3%)
Mutual labels:  speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-63.95%)
Mutual labels:  speech
Julius
Open-Source Large Vocabulary Continuous Speech Recognition Engine
Stars: ✭ 1,258 (+1362.79%)
Mutual labels:  speech
Chainer Vq Vae
A Chainer implementation of VQ-VAE.
Stars: ✭ 77 (-10.47%)
Mutual labels:  wavenet
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-33.72%)
Mutual labels:  speech
Dc tts
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Stars: ✭ 1,017 (+1082.56%)
Mutual labels:  speech
Watbot
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.
Stars: ✭ 64 (-25.58%)
Mutual labels:  speech
Vq Vae Wavenet
TensorFlow implementation of VQ-VAE with WaveNet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810
Stars: ✭ 40 (-53.49%)
Mutual labels:  wavenet
Deepspeech
A PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+1317.44%)
Mutual labels:  speech
Wsay
Windows "say"
Stars: ✭ 36 (-58.14%)
Mutual labels:  speech
Wavenet
WaveNet implementation with chainer
Stars: ✭ 53 (-38.37%)
Mutual labels:  wavenet
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-19.77%)
Mutual labels:  speech
Soloud
Free, easy, portable audio engine for games
Stars: ✭ 1,048 (+1118.6%)
Mutual labels:  speech

WaveNet-Enhancement

bawn.py contains contains most of the functions including definition of the WaveNet structure.

bawn_pr_multi_gpu_train.py trains the prior model.

bawn_ll_multi_gpu_train.py trains the likelihood model with a fixed prior model.

generator_ll.py builds a fast sample by sample generator for the prior or the likelihood model. It generates pseudo clean prediction using the current model in the iterative training process as described in the our paper.

Train Prior Model

The prior model is a 40 layer WaveNet consists of 4 blocks with 10 layers each.

Therefore, the input shape is 20477 by data size, and the output shape is 16384 by data size.

Input data is bin indices of clean speech using 256 mu-law quantization. Output data is the corresponding prediction shifted by one sample to the right, making the model always predicts the next sample based on the past samples.

Input data: train_pr.mat
DataType: int32

Output data: target_pr.mat DataType: uint8

Usage

python bawn_pr_multi_gpu_train /logdir NUM_GPUS

Train Likelihood Model

Besides the pre-trained prior model, the likelihood model consists of 2 more copies of WaveNet as in the prior model.

Therefore, the clean input shape is 20477 by data size, the noisy input shape is 24570 by data size, and the output shape is 16384 by data size.

Noisy input data is raw noisy audio samples. Clean input data is the bin indices of its clean counterpart. Output data is the expected prediction the same as in training prior model.

Clean input data: clean_train.mat DataType: int32

Noisy input data: noisy_train.mat DataType: float32

Output data: target_train.mat DataType: uint8

Usage

python bawn_ll_multi_gpu_train.py /logdir /path_to_prior_model NUM_GPUS

Note

Training likelihood model is very time-consuming due to the iterative training process. Recently, it has been greatly accelerated using approximations, which has not been included in this branch yet.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].