All Projects → keonlee9420 → Expressive-FastSpeech2

keonlee9420 / Expressive-FastSpeech2

Licence: other
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Expressive-FastSpeech2

Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (-23.02%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-60.43%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-52.52%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Stars: ✭ 105 (-24.46%)
Mutual labels:  text-to-speech, tts, expressive-speech-synthesis, expressive-tts
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-70.5%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (+7.19%)
Mutual labels:  text-to-speech, tts, speech-synthesis, non-autoregressive
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1613.67%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Jsut Lab
HTS-style full-context labels for JSUT v1.1
Stars: ✭ 28 (-79.86%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-74.1%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+1076.98%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (+133.81%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-62.59%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Crystal
Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.
Stars: ✭ 108 (-22.3%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+390.65%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+160.43%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-77.7%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (+133.09%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-25.9%)
Mutual labels:  text-to-speech, tts, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+112.23%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (-12.23%)
Mutual labels:  text-to-speech, tts, speech-synthesis

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

  1. Non-autoregressive Expressive TTS: This project aims to provide a cornerstone for future research and application on a non-autoregressive expressive TTS including Emotional TTS and Conversational TTS. For datasets, AIHub Multimodal Video AI datasets and IEMOCAP database are picked for Korean and English, respectively.

    Note: If you are interested in GST-Tacotron or VAE-Tacotron like expressive stylistic TTS model but under non-autoregressive decoding, you may also be interested in STYLER [demo, code].

  2. Annotated Data Processing: This project shed light on how to handle the new dataset, even with a different language, for the successful training of non-autoregressive emotional TTS.

  3. English and Korean TTS: In addition to English, this project gives a broad view of treating Korean for the non-autoregressive TTS where the additional data processing must be considered under the language-specific features (e.g., training Montreal Forced Aligner with your own language and dataset). Please closely look into text/.

  4. Adopting Own Language: For those who are interested in adapting other languages, please refer to the "Training with your own dataset (own language)" section of the categorical branch.

Repository Structure

In this project, FastSpeech2 is adapted as a base non-autoregressive multi-speaker TTS framework, so it would be helpful to read the paper and code first (Also see FastSpeech2 branch).

  1. Emotional TTS: Following branches contain implementations of the basic paradigm intorduced by Emotional End-to-End Neural Speech synthesizer.

    • categorical branch: only conditioning categorical emotional descriptors (such as happy, sad, etc.)
    • continuous branch: conditioning continuous emotional descriptors (such as arousal, valence, etc.) in addition to categorical emotional descriptors
  2. Conversational TTS: Following branch contains implementation of Conversational End-to-End TTS for Voice Agent

Citation

If you would like to use or refer to this implementation, please cite the repo.

@misc{lee2021expressive_fastspeech2,
  author = {Lee, Keon},
  title = {Expressive-FastSpeech2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Expressive-FastSpeech2}}
}

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].