😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+5078.26%)

Mutual labels: vocoder

Tts

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Stars: ✭ 5,427 (+11697.83%)

Mutual labels: vocoder

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (+58.7%)

Mutual labels: vocoder

codec2 talkie

Turn your Android phone into Codec2 Walkie-Talkie (Bluetooth/USB/TCPIP KISS modem client for DV digital voice communication)

Stars: ✭ 65 (+41.3%)

Mutual labels: vocoder

LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (+45.65%)

Mutual labels: vocoder

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+4086.96%)

Mutual labels: neural-vocoder

Wavernn

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+3456.52%)

Mutual labels: neural-vocoder

INTERSPEECH19 TUTORIAL

Interspeech 2019 tutorial materials

Stars: ✭ 46 (+0%)

Mutual labels: neural-vocoder

Universal Vocoder

This is a restructured and rewritten version of bshall/UniversalVocoding. The main difference here is that the model is turned into a TorchScript module during training and can be loaded for inferencing anywhere without Python dependencies.

Generate waveforms using pretrained models

Since the pretrained models were turned to TorchScript, you can load a trained model anywhere. Also you can generate multiple waveforms parallelly, e.g.

import torch

vocoder = torch.jit.load("vocoder.pt")

mels = [
    torch.randn(100, 80),
    torch.randn(200, 80),
    torch.randn(300, 80),
] # (length, mel_dim)

with torch.no_grad():
    wavs = vocoder.generate(mels)

Emperically, if you're using the default architecture, you can generate 30 samples at the same time on an GTX 1080 Ti.

Train from scratch

Multiple directories containing audio files can be processed at the same time, e.g.

python preprocess.py \
    VCTK-Corpus \
    LibriTTS/train-clean-100 \
    preprocessed # the output directory of preprocessed data

And train the model with the preprocessed data, e.g.

python train.py preprocessed

With the default settings, it would take around 12 hr to train to 100K steps on an RTX 2080 Ti.

References

Towards achieving robust universal neural vocoding

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yistLin / universal-vocoder

Programming Languages

Labels

Projects that are alternatives of or similar to universal-vocoder

Universal Vocoder

Generate waveforms using pretrained models

Train from scratch

References