Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → keonlee9420 → Expressive-FastSpeech2

keonlee9420 / Expressive-FastSpeech2

Licence: other

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Programming Languages

139335 projects - #7 most used programming language

Labels

text-to-speech tts speech-synthesis expressive-speech-synthesis non-autoregressive emotional-tts korean-tts expressive-tts emotional-speech-synthesis korean-speech-synthesis conversational-tts conversational-speech-synthesis

Projects that are alternatives of or similar to Expressive-FastSpeech2

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-23.02%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-60.43%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-52.52%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021

Stars: ✭ 105 (-24.46%)

Mutual labels: text-to-speech, tts, expressive-speech-synthesis, expressive-tts

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-70.5%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (+7.19%)

Mutual labels: text-to-speech, tts, speech-synthesis, non-autoregressive

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1613.67%)

Mutual labels: text-to-speech, tts, speech-synthesis

HTS-style full-context labels for JSUT v1.1

Stars: ✭ 28 (-79.86%)

Mutual labels: text-to-speech, tts, speech-synthesis

Windows "say"

Stars: ✭ 36 (-74.1%)

Mutual labels: text-to-speech, tts, speech-synthesis

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+1076.98%)

Mutual labels: text-to-speech, tts, speech-synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Stars: ✭ 325 (+133.81%)

Mutual labels: text-to-speech, tts, speech-synthesis

Cs224n Gpu That Talks

Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)

Stars: ✭ 52 (-62.59%)

Mutual labels: text-to-speech, tts, speech-synthesis

Crystal - C++ implementation of a unified framework for multilingual TTS synthesis engine with SSML specification as interface.

Stars: ✭ 108 (-22.3%)

Mutual labels: text-to-speech, tts, speech-synthesis

Parallelwavegan

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

Stars: ✭ 682 (+390.65%)

Mutual labels: text-to-speech, tts, speech-synthesis

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+160.43%)

Mutual labels: text-to-speech, tts, speech-synthesis

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-77.7%)

Mutual labels: text-to-speech, tts, speech-synthesis

Multilingual text to speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Stars: ✭ 324 (+133.09%)

Mutual labels: text-to-speech, tts, speech-synthesis

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (-25.9%)

Mutual labels: text-to-speech, tts, speech-synthesis

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+112.23%)

Mutual labels: text-to-speech, tts, speech-synthesis

Text to Speech with PyTorch (English and Mongolian)

Stars: ✭ 122 (-12.23%)

Mutual labels: text-to-speech, tts, speech-synthesis

View All Similar Projects ➔

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

Non-autoregressive Expressive TTS: This project aims to provide a cornerstone for future research and application on a non-autoregressive expressive TTS including Emotional TTS and Conversational TTS. For datasets, AIHub Multimodal Video AI datasets and IEMOCAP database are picked for Korean and English, respectively.

Note: If you are interested in GST-Tacotron or VAE-Tacotron like expressive stylistic TTS model but under non-autoregressive decoding, you may also be interested in STYLER [demo, code].
Annotated Data Processing: This project shed light on how to handle the new dataset, even with a different language, for the successful training of non-autoregressive emotional TTS.
English and Korean TTS: In addition to English, this project gives a broad view of treating Korean for the non-autoregressive TTS where the additional data processing must be considered under the language-specific features (e.g., training Montreal Forced Aligner with your own language and dataset). Please closely look into text/.
Adopting Own Language: For those who are interested in adapting other languages, please refer to the "Training with your own dataset (own language)" section of the categorical branch.

Repository Structure

In this project, FastSpeech2 is adapted as a base non-autoregressive multi-speaker TTS framework, so it would be helpful to read the paper and code first (Also see FastSpeech2 branch).

Emotional TTS: Following branches contain implementations of the basic paradigm intorduced by Emotional End-to-End Neural Speech synthesizer.
- categorical branch: only conditioning categorical emotional descriptors (such as happy, sad, etc.)
- continuous branch: conditioning continuous emotional descriptors (such as arousal, valence, etc.) in addition to categorical emotional descriptors
Conversational TTS: Following branch contains implementation of Conversational End-to-End TTS for Voice Agent
- conversational branch: conditioning chat history

Citation

If you would like to use or refer to this implementation, please cite the repo.

@misc{lee2021expressive_fastspeech2,
  author = {Lee, Keon},
  title = {Expressive-FastSpeech2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Expressive-FastSpeech2}}
}

References

ming024's FastSpeech2 (Later than 2021.02.26 ver.)
HGU-DLLAB's Korean-FastSpeech2-Pytorch
hccho2's Tacotron2-Wavenet-Korean-TTS
carpedm20' multi-speaker-tacotron-tensorflow

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 139

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗