All Projects → twidddj → Tf Wavenet_vocoder

twidddj / Tf Wavenet_vocoder

Licence: mit
Wavenet and its applications with Tensorflow

Projects that are alternatives of or similar to Tf Wavenet vocoder

Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+1075.86%)
Mutual labels:  jupyter-notebook, speech-synthesis, wavenet
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+6253.45%)
Mutual labels:  jupyter-notebook, speech-synthesis
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (+110.34%)
Mutual labels:  jupyter-notebook, speech-synthesis
QPPWG
Quasi-Periodic Parallel WaveGAN Pytorch implementation
Stars: ✭ 41 (-29.31%)
Mutual labels:  speech-synthesis, wavenet
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+3220.69%)
Mutual labels:  speech-synthesis, wavenet
Waveflow
A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio"
Stars: ✭ 95 (+63.79%)
Mutual labels:  jupyter-notebook, speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+322.41%)
Mutual labels:  jupyter-notebook, speech-synthesis
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+3293.1%)
Mutual labels:  speech-synthesis, wavenet
Pytorchwavenetvocoder
WaveNet-Vocoder implementation with pytorch.
Stars: ✭ 269 (+363.79%)
Mutual labels:  speech-synthesis, wavenet
Gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Stars: ✭ 460 (+693.1%)
Mutual labels:  jupyter-notebook, speech-synthesis
Flowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Stars: ✭ 546 (+841.38%)
Mutual labels:  jupyter-notebook, speech-synthesis
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (+317.24%)
Mutual labels:  jupyter-notebook, speech-synthesis
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+172.41%)
Mutual labels:  speech-synthesis, wavenet
Tacotron2
pytorch tacotron2 https://arxiv.org/pdf/1712.05884.pdf
Stars: ✭ 46 (-20.69%)
Mutual labels:  jupyter-notebook, wavenet
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-10.34%)
Mutual labels:  jupyter-notebook, speech-synthesis
Machinelearning
Stars: ✭ 57 (-1.72%)
Mutual labels:  jupyter-notebook
Deep learning bootcamp
All the learning material for deep learning bootcamp can be found in this repository
Stars: ✭ 58 (+0%)
Mutual labels:  jupyter-notebook
Cognitive Emotion Python
Python SDK for the Microsoft Emotion API, part of Cognitive Services
Stars: ✭ 57 (-1.72%)
Mutual labels:  jupyter-notebook
Rbf Network
Minimal implementation of a radial basis function network.
Stars: ✭ 57 (-1.72%)
Mutual labels:  jupyter-notebook
Regex In Python
A comprehensive guide for learning regular expressions using Python
Stars: ✭ 58 (+0%)
Mutual labels:  jupyter-notebook

Wavenet

The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation.

Moreover It can be used almost all sequence generation even text or image.

This repository provides some works related to WaveNet.

Features

  • [x] Local conditioning
  • [x] Generalized fast generation algorithm
  • [x] Mixture of discretized logistics loss
  • [ ] Parallel Wavenet

Generalized fast generation algorithm

We generalized Fast wavenet to filter width > 1 by using Multi-Queue structured matrix which has size of (Dilation x (filter_width - 1) x batch_size x channel_size).

When you generate a sample, you must feed the number of samples that have generated to the function. This is because the queue has to choose before queueing operation.

You can find easily the modified points and details of the algorithm in here.

Check the usage of the incremental generator in here.

Applications

Vocoder

Neural Vocoder can generate high quality raw speech samples conditioned on linguistic or acoustic features.

We tested our model followed @r9y9's works.

Audio samples are available at https://twidddj.github.io/docs/vocoder. See the issue for the result in here.

Pre-trained models

Model URL Data Steps
link LJSpeech 680k steps

Getting Start

0. Download dataset

  • the voice conversion dataset(for multi speaker, 16k): cmu_arctic
  • the single speaker dataset(22.05k): LJSpeech-1.0

1. Preprocess data

python -m apps.vocoder.preprocess --num_workers 4 --name ljspeech --in_dir /your_path/LJSpeech-1.0 --out_dir /your_outpath/

2. Train model

python -m apps.vocoder.train --metadata_path {~/yourpath/train.txt} --data_path {~/yourpath/npy} --log_dir {~/log_dir_path}

3. Test model

You can find the codes for testing trained model in here.

Requirements

Code is tested on TensorFlow version 1.4 for Python 3.6.

References

Related Papers

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].