All Projects → LEEYOONHYUNG → Bvae Tts

LEEYOONHYUNG / Bvae Tts

Licence: mit
Official implementation of BVAE-TTS

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bvae Tts

Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+702.35%)
Mutual labels:  text-to-speech
Asrgen
Attacking Speaker Recognition with Deep Generative Models
Stars: ✭ 31 (-63.53%)
Mutual labels:  text-to-speech
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-38.82%)
Mutual labels:  text-to-speech
Zhrtvc
Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。
Stars: ✭ 771 (+807.06%)
Mutual labels:  text-to-speech
Jsut Lab
HTS-style full-context labels for JSUT v1.1
Stars: ✭ 28 (-67.06%)
Mutual labels:  text-to-speech
Friend.ly
A social media platform with a friend recommendation engine based on personality trait extraction
Stars: ✭ 41 (-51.76%)
Mutual labels:  text-to-speech
Transformertts
🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
Stars: ✭ 617 (+625.88%)
Mutual labels:  text-to-speech
Watbot
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.
Stars: ✭ 64 (-24.71%)
Mutual labels:  text-to-speech
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-63.53%)
Mutual labels:  text-to-speech
Textnormalizationcoveringgrammars
Covering grammars for English and Russian text normalization
Stars: ✭ 46 (-45.88%)
Mutual labels:  text-to-speech
Espeak Ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Stars: ✭ 799 (+840%)
Mutual labels:  text-to-speech
Botium Speech Processing
Botium Speech Processing
Stars: ✭ 908 (+968.24%)
Mutual labels:  text-to-speech
Tacotron2
A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".
Stars: ✭ 43 (-49.41%)
Mutual labels:  text-to-speech
Rhvoice
a free and open source speech synthesizer for Russian and other languages
Stars: ✭ 750 (+782.35%)
Mutual labels:  text-to-speech
Voicenet
Speech synthesis platform based on tensorflow and sonnet
Stars: ✭ 60 (-29.41%)
Mutual labels:  text-to-speech
Pyttsx3
Offline Text To Speech synthesis for python
Stars: ✭ 637 (+649.41%)
Mutual labels:  text-to-speech
Wsay
Windows "say"
Stars: ✭ 36 (-57.65%)
Mutual labels:  text-to-speech
Merlin
This is now the official location of the Merlin project.
Stars: ✭ 1,168 (+1274.12%)
Mutual labels:  text-to-speech
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+1217.65%)
Mutual labels:  text-to-speech
Tacotron2
pytorch tacotron2 https://arxiv.org/pdf/1712.05884.pdf
Stars: ✭ 46 (-45.88%)
Mutual labels:  text-to-speech

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Yoonhyung Lee, Joongbo Shin, Kyomin Jung

Abstract: Although early text-to-speech (TTS) models such as Tacotron 2 have succeeded in generating human-like speech, their autoregressive architectures have several limitations: (1) They require a lot of time to generate a mel-spectrogram consisting of hundreds of steps. (2) The autoregressive speech generation shows a lack of robustness due to its error propagation property. In this paper, we propose a novel non-autoregressive TTS model called BVAE-TTS, which eliminates the architectural limitations and generates a mel-spectrogram in parallel. BVAE-TTS adopts a bidirectional-inference variational autoencoder (BVAE) that learns hierarchical latent representations using both bottom-up and top-down paths to increase its expressiveness. To apply BVAE to TTS, we design our model to utilize text information via an attention mechanism. By using attention maps that BVAE-TTS generates, we train a duration predictor so that the model uses the predicted duration of each phoneme at inference. In experiments conducted on LJSpeech dataset, we show that our model generates a mel-spectrogram 27 times faster than Tacotron 2 with similar speech quality. Furthermore, our BVAE-TTS outperforms Glow-TTS, which is one of the state-of-the-art non-autoregressive TTS models, in terms of both speech quality and inference speed while having 58% fewer parameters. One-sentence Summary: In this paper, a novel non-autoregressive text-to-speech model based on bidirectional-inference variational autoencoder called BVAE-TTS is proposed.

Training

  1. Download and extract the LJ Speech dataset
  2. Make preprocessed folder in the LJSpeech directory and do preprocessing of the data using prepare_data.ipynb
  3. Set the data_path in hparams.py to the preprocessed folder
  4. Train your own BVAE-TTS model
python train.py --gpu=0 --logdir=baseline  

Pre-trained models

We provide a pre-trained BVAE-TTS model, which is a model that you would obtain with the current setting (e.g. hyperparameters, dataset split). Also, we provide a pre-trained WaveGlow model that is used to obtain the audio samples. After downloading the models, you can generate audio samples using inference.ipynb.

Audio Samples

You can hear the audio samples here

Reference

1.NVIDIA/tacotron2: https://github.com/NVIDIA/tacotron2
2.NVIDIA/waveglow: https://github.com/NVIDIA/waveglow
3.pclucas/iaf-vae: https://github.com/pclucas14/iaf-vae

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].