All Projects → lmnt-com → Wavegrad

lmnt-com / Wavegrad

Licence: apache-2.0
A fast, high-quality neural vocoder.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Wavegrad

Diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Stars: ✭ 139 (+0.72%)
Mutual labels:  paper, speech, pretrained-models, speech-synthesis, text-to-speech
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (-21.74%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-76.09%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-73.91%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+1326.09%)
Mutual labels:  paper, speech-synthesis, text-to-speech
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+16.67%)
Mutual labels:  text-to-speech, speech, speech-synthesis
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+14.49%)
Mutual labels:  text-to-speech, speech, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+113.77%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-47.1%)
Mutual labels:  text-to-speech, speech, speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-46.38%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (-19.57%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+77.54%)
Mutual labels:  speech, speech-synthesis, text-to-speech
melgan
MelGAN implementation with Multi-Band and Full Band supports...
Stars: ✭ 54 (-60.87%)
Mutual labels:  text-to-speech, speech, speech-synthesis
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-62.32%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+162.32%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-77.54%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Watbot
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.
Stars: ✭ 64 (-53.62%)
Mutual labels:  speech, text-to-speech
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-62.32%)
Mutual labels:  speech-synthesis, text-to-speech
Nlp Paper
自然语言处理领域下的对话语音领域,整理相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)
Stars: ✭ 67 (-51.45%)
Mutual labels:  paper, speech
Gtts
Python library and CLI tool to interface with Google Translate's text-to-speech API
Stars: ✭ 1,303 (+844.2%)
Mutual labels:  speech, text-to-speech

WaveGrad

PyPI Release License

WaveGrad is a fast, high-quality neural vocoder designed by the folks at Google Brain. The architecture is described in WaveGrad: Estimating Gradients for Waveform Generation. In short, this model takes a log-scaled Mel spectrogram and converts it to a waveform via iterative refinement.

Status (2020-10-15)

  • [x] stable training (22 kHz, 24 kHz)
  • [x] high-quality synthesis
  • [x] mixed-precision training
  • [x] multi-GPU training
  • [x] custom noise schedule (faster inference)
  • [x] command-line inference
  • [x] programmatic inference API
  • [x] PyPI package
  • [x] audio samples
  • [x] pretrained models
  • [ ] precomputed noise schedule

Audio samples

24 kHz audio samples

Pretrained models

24 kHz pretrained model (183 MB, SHA256: 65e9366da318d58d60d2c78416559351ad16971de906e53b415836c068e335f3)

Install

Install using pip:

pip install wavegrad

or from GitHub:

git clone https://github.com/lmnt-com/wavegrad.git
cd wavegrad
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22 kHz. If you need to change this value, edit params.py.

python -m wavegrad.preprocess /path/to/dir/containing/wavs
python -m wavegrad /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible speech by ~20k steps (~1.5h on a 2080 Ti).

Inference API

Basic usage:

from wavegrad.inference import predict as wavegrad_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.

If you have a custom noise schedule (see below):

from wavegrad.inference import predict as wavegrad_predict

params = { 'noise_schedule': np.load('/path/to/noise_schedule.npy') }
model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = wavegrad_predict(spectrogram, model_dir, params=params)

# `audio` is a GPU tensor in [N,T] format.

Inference CLI

python -m wavegrad.inference /path/to/model /path/to/spectrogram -o output.wav

Noise schedule

The default implementation uses 1000 iterations to refine the waveform, which runs slower than real-time. WaveGrad is able to achieve high-quality, faster than real-time synthesis with as few as 6 iterations without re-training the model with new hyperparameters.

To achieve this speed-up, you will need to search for a noise schedule that works well for your dataset. This implementation provides a script to perform the search for you:

python -m wavegrad.noise_schedule /path/to/trained/model /path/to/preprocessed/validation/dataset
python -m wavegrad.inference /path/to/trained/model /path/to/spectrogram -n noise_schedule.npy -o output.wav

The default settings should give good results without spending too much time on the search. If you'd like to find a better noise schedule or use a different number of inference iterations, run the noise_schedule script with --help to see additional configuration options.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].