All Projects → lmnt-com → Diffwave

lmnt-com / Diffwave

Licence: apache-2.0
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Diffwave

Wavegrad
A fast, high-quality neural vocoder.
Stars: ✭ 138 (-0.72%)
Mutual labels:  paper, speech, pretrained-models, speech-synthesis, text-to-speech
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+15.83%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-74.1%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-77.7%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+76.26%)
Mutual labels:  speech, speech-synthesis, text-to-speech
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+112.23%)
Mutual labels:  text-to-speech, speech, speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (-22.3%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+1315.83%)
Mutual labels:  paper, speech-synthesis, text-to-speech
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+13.67%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (-20.14%)
Mutual labels:  speech, speech-synthesis, text-to-speech
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-62.59%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-76.26%)
Mutual labels:  text-to-speech, speech, speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-46.76%)
Mutual labels:  text-to-speech, speech, speech-synthesis
melgan
MelGAN implementation with Multi-Band and Full Band supports...
Stars: ✭ 54 (-61.15%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-47.48%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+160.43%)
Mutual labels:  speech, speech-synthesis, text-to-speech
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-62.59%)
Mutual labels:  speech-synthesis, text-to-speech
Tacotron2
A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".
Stars: ✭ 43 (-69.06%)
Mutual labels:  speech-synthesis, text-to-speech
Watbot
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.
Stars: ✭ 64 (-53.96%)
Mutual labels:  speech, text-to-speech
Merlin
This is now the official location of the Merlin project.
Stars: ✭ 1,168 (+740.29%)
Mutual labels:  speech-synthesis, text-to-speech

DiffWave

PyPI Release License

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer. It starts with Gaussian noise and converts it into speech via iterative refinement. The speech can be controlled by providing a conditioning signal (e.g. log-scaled Mel spectrogram). The model and architecture details are described in DiffWave: A Versatile Diffusion Model for Audio Synthesis.

What's new (2020-10-14)

  • new pretrained model trained for 1M steps
  • updated audio samples with output from new model

Status (2020-10-14)

  • [x] stable training
  • [x] high-quality synthesis
  • [x] mixed-precision training
  • [x] multi-GPU training
  • [x] command-line inference
  • [x] programmatic inference API
  • [x] PyPI package
  • [x] audio samples
  • [x] pretrained models
  • [ ] unconditional waveform synthesis

Big thanks to Zhifeng Kong (lead author of DiffWave) for pointers and bug fixes.

Audio samples

22.05 kHz audio samples

Pretrained models

22.05 kHz pretrained model (31 MB, SHA256: d415d2117bb0bba3999afabdd67ed11d9e43400af26193a451d112e2560821a8)

This pre-trained model is able to synthesize speech with a real-time factor of 0.87 (smaller is faster).

Pre-trained model details

  • trained on 4x 1080Ti
  • default parameters
  • single precision floating point (FP32)
  • trained on LJSpeech dataset excluding LJ001* and LJ002*
  • trained for 1000578 steps (1273 epochs)

Install

Install using pip:

pip install diffwave

or from GitHub:

git clone https://github.com/lmnt-com/diffwave.git
cd diffwave
pip install .

Training

Before you start training, you'll need to prepare a training dataset. The dataset can have any directory structure as long as the contained .wav files are 16-bit mono (e.g. LJSpeech, VCTK). By default, this implementation assumes a sample rate of 22.05 kHz. If you need to change this value, edit params.py.

python -m diffwave.preprocess /path/to/dir/containing/wavs
python -m diffwave /path/to/model/dir /path/to/dir/containing/wavs

# in another shell to monitor training progress:
tensorboard --logdir /path/to/model/dir --bind_all

You should expect to hear intelligible (but noisy) speech by ~8k steps (~1.5h on a 2080 Ti).

Multi-GPU training

By default, this implementation uses as many GPUs in parallel as returned by torch.cuda.device_count(). You can specify which GPUs to use by setting the CUDA_DEVICES_AVAILABLE environment variable before running the training module.

Inference API

Basic usage:

from diffwave.inference import predict as diffwave_predict

model_dir = '/path/to/model/dir'
spectrogram = # get your hands on a spectrogram in [N,C,W] format
audio, sample_rate = diffwave_predict(spectrogram, model_dir)

# audio is a GPU tensor in [N,T] format.

Inference CLI

python -m diffwave.inference /path/to/model /path/to/spectrogram -o output.wav

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].