Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → auspicious3000 → Wavenet Enhancement

auspicious3000 / Wavenet Enhancement

Speech Enhancement using Bayesian WaveNet

Programming Languages

139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Wavenet Enhancement

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+83.72%)

Mutual labels: speech, wavenet

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+2139.53%)

Mutual labels: speech, wavenet

hifigan-denoiser

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Stars: ✭ 88 (+2.33%)

Mutual labels: speech, wavenet

PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]

Stars: ✭ 187 (+117.44%)

Mutual labels: speech, wavenet

Speech Denoising Wavenet

A neural network for end-to-end speech denoising

Stars: ✭ 516 (+500%)

Mutual labels: speech, wavenet

End to End Dialect Identification using Convolutional Neural Network

Stars: ✭ 40 (-53.49%)

Mutual labels: speech

Tf Wavenet vocoder

Wavenet and its applications with Tensorflow

Stars: ✭ 58 (-32.56%)

Mutual labels: wavenet

Discordspeechbot

A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.

Stars: ✭ 35 (-59.3%)

Mutual labels: speech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-63.95%)

Mutual labels: speech

Open-Source Large Vocabulary Continuous Speech Recognition Engine

Stars: ✭ 1,258 (+1362.79%)

Mutual labels: speech

A Chainer implementation of VQ-VAE.

Stars: ✭ 77 (-10.47%)

Mutual labels: wavenet

Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework

Stars: ✭ 57 (-33.72%)

Mutual labels: speech

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Stars: ✭ 1,017 (+1082.56%)

Mutual labels: speech

An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.

Stars: ✭ 64 (-25.58%)

Mutual labels: speech

TensorFlow implementation of VQ-VAE with WaveNet decoder, based on https://arxiv.org/abs/1711.00937 and https://arxiv.org/abs/1901.08810

Stars: ✭ 40 (-53.49%)

Mutual labels: wavenet

A PaddlePaddle implementation of ASR.

Stars: ✭ 1,219 (+1317.44%)

Mutual labels: speech

Windows "say"

Stars: ✭ 36 (-58.14%)

Mutual labels: speech

WaveNet implementation with chainer

Stars: ✭ 53 (-38.37%)

Mutual labels: wavenet

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-19.77%)

Mutual labels: speech

Free, easy, portable audio engine for games

Stars: ✭ 1,048 (+1118.6%)

Mutual labels: speech

View All Similar Projects ➔

WaveNet-Enhancement

bawn.py contains contains most of the functions including definition of the WaveNet structure.

bawn_pr_multi_gpu_train.py trains the prior model.

bawn_ll_multi_gpu_train.py trains the likelihood model with a fixed prior model.

generator_ll.py builds a fast sample by sample generator for the prior or the likelihood model. It generates pseudo clean prediction using the current model in the iterative training process as described in the our paper.

Train Prior Model

The prior model is a 40 layer WaveNet consists of 4 blocks with 10 layers each.

Therefore, the input shape is 20477 by data size, and the output shape is 16384 by data size.

Input data is bin indices of clean speech using 256 mu-law quantization. Output data is the corresponding prediction shifted by one sample to the right, making the model always predicts the next sample based on the past samples.

Input data: train_pr.mat
DataType: int32

Output data: target_pr.mat DataType: uint8

Usage

python bawn_pr_multi_gpu_train /logdir NUM_GPUS

Train Likelihood Model

Besides the pre-trained prior model, the likelihood model consists of 2 more copies of WaveNet as in the prior model.

Therefore, the clean input shape is 20477 by data size, the noisy input shape is 24570 by data size, and the output shape is 16384 by data size.

Noisy input data is raw noisy audio samples. Clean input data is the bin indices of its clean counterpart. Output data is the expected prediction the same as in training prior model.

Clean input data: clean_train.mat DataType: int32

Noisy input data: noisy_train.mat DataType: float32

Output data: target_train.mat DataType: uint8

Usage

python bawn_ll_multi_gpu_train.py /logdir /path_to_prior_model NUM_GPUS

Note

Training likelihood model is very time-consuming due to the iterative training process. Recently, it has been greatly accelerated using approximations, which has not been included in this branch yet.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 86

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗