Edresson / YourTTS

Licence: other

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to YourTTS

MediumVC

Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

Stars: ✭ 46 (-78.8%)

Mutual labels: speech-synthesis, voice-conversion

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+287.56%)

Mutual labels: tts, speech-synthesis

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-69.59%)

Mutual labels: tts, speech-synthesis

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-84.79%)

Mutual labels: tts, speech-synthesis

spoken-word

Spoken Word

Stars: ✭ 46 (-78.8%)

Mutual labels: tts, speech-synthesis

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-50.69%)

Mutual labels: tts, speech-synthesis

deep-learning-german-tts

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Stars: ✭ 268 (+23.5%)

Mutual labels: tts, speech-synthesis

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Stars: ✭ 139 (-35.94%)

Mutual labels: tts, speech-synthesis

ppg-vc

PPG-Based Voice Conversion

Stars: ✭ 154 (-29.03%)

Mutual labels: speech-synthesis, voice-conversion

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-74.65%)

Mutual labels: tts, speech-synthesis

TensorVox

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (-35.48%)

Mutual labels: tts, speech-synthesis

talkie

Text-to-speech browser extension button. Select text on any web page, and have the computer read it out loud for you by simply clicking the Talkie button.

Stars: ✭ 43 (-80.18%)

Mutual labels: tts, speech-synthesis

StyleSpeech

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (-25.81%)

Mutual labels: tts, speech-synthesis

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (-31.34%)

Mutual labels: tts, speech-synthesis

SingleVC

Any-to-one voice conversion using the data augment strategy: pitch shifted and duration remained.

Stars: ✭ 25 (-88.48%)

Mutual labels: speech-synthesis, voice-conversion

AdaSpeech

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (-50.23%)

Mutual labels: tts, speech-synthesis

TFGAN

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Stars: ✭ 65 (-70.05%)

Mutual labels: tts, speech-synthesis

vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Stars: ✭ 1,604 (+639.17%)

Mutual labels: tts, speech-synthesis

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-81.11%)

Mutual labels: tts, speech-synthesis

LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (-69.12%)

Mutual labels: tts, speech-synthesis

View All Similar Projects ➔

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

In our recent paper we propose the YourTTS model. YourTTS brings the power of a multilingual approach to the task of zero-shot multi-speaker TTS. Our method builds upon the VITS model and adds several novel modifications for zero-shot multi-speaker and multilingual training. We achieved state-of-the-art (SOTA) results in zero-shot multi-speaker TTS and results comparable to SOTA in zero-shot voice conversion on the VCTK dataset. Additionally, our approach achieves promising results in a target language with a single-speaker dataset, opening possibilities for zero-shot multi-speaker TTS and zero-shot voice conversion systems in low-resource languages. Finally, it is possible to fine-tune the YourTTS model with less than 1 minute of speech and achieve state-of-the-art results in voice similarity and with reasonable quality. This is important to allow synthesis for speakers with a very different voice or recording characteristics from those seen during training.

Audios samples

Visit our website for audio samples.

Implementation

All of our experiments were implemented on the Coqui TTS repo.

Colab Demos

Demo	URL
Zero-Shot TTS	link
Zero-Shot VC	link

Checkpoints

All the released checkpoints are licensed under CC BY-NC-ND 4.0

Model	URL
Speaker Encoder	link
Exp 1. YourTTS-EN(VCTK)	link
Exp 1. YourTTS-EN(VCTK) + SCL	link
Exp 2. YourTTS-EN(VCTK)-PT	link
Exp 2. YourTTS-EN(VCTK)-PT + SCL	link
Exp 3. YourTTS-EN(VCTK)-PT-FR	link
Exp 3. YourTTS-EN(VCTK)-PT-FR SCL	link
Exp 4. YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL	link

Results replicability

To insure replicability, we make the audios used to generate the MOS available here. In addition, we provide the MOS for each audio here.

To re-generate our MOS results, follow the instructions here. To predict the test sentences and generate the SECS, please use the Jupyter Notebooks available here.

Test Speakers:

LibriTTS (test clean): 1188, 1995, 260, 1284, 2300, 237, 908, 1580, 121 and 1089

VCTK: p261, p225, p294, p347, p238, p234, p248, p335, p245, p326 and p302

MLS Portuguese: 12710, 5677, 12249, 12287, 9351, 11995, 7925, 3050, 4367 and 13069

Citation


@ARTICLE{2021arXiv211202418C,
  author = {{Casanova}, Edresson and {Weber}, Julian and {Shulby}, Christopher and {Junior}, Arnaldo Candido and {G{\"o}lge}, Eren and {Antonelli Ponti}, Moacir},
  title = "{YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone}",
  journal = {arXiv e-prints},
  keywords = {Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing},
  year = 2021,
  month = dec,
  eid = {arXiv:2112.02418},
  pages = {arXiv:2112.02418},
  archivePrefix = {arXiv},
  eprint = {2112.02418},
  primaryClass = {cs.SD},
  adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211202418C},
  adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Edresson / YourTTS

Programming Languages

Labels

Projects that are alternatives of or similar to YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Audios samples

Implementation

Colab Demos

Checkpoints

Results replicability

Test Speakers:

Citation