Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Tomiinek → Multilingual_text_to_speech

Tomiinek / Multilingual_text_to_speech

Licence: mit

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

text-to-speech tts speech-synthesis multilingual

Projects that are alternatives of or similar to Multilingual text to speech

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-83.95%)

Mutual labels: text-to-speech, tts, speech-synthesis

editts

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (-77.16%)

Mutual labels: text-to-speech, tts, speech-synthesis

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-83.02%)

Mutual labels: text-to-speech, tts, speech-synthesis

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+159.57%)

Mutual labels: text-to-speech, tts, speech-synthesis

Cognitive Speech Tts

Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.

Stars: ✭ 312 (-3.7%)

Mutual labels: speech-synthesis, text-to-speech, tts

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-87.35%)

Mutual labels: text-to-speech, tts, speech-synthesis

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (-77.47%)

Mutual labels: text-to-speech, tts, speech-synthesis

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-66.98%)

Mutual labels: text-to-speech, tts, speech-synthesis

esp32-flite

Speech synthesis running on ESP32 based on Flite engine.

Stars: ✭ 28 (-91.36%)

Mutual labels: text-to-speech, tts, speech-synthesis

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (-51.23%)

Mutual labels: text-to-speech, tts, speech-synthesis

Parakeet

PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)

Stars: ✭ 279 (-13.89%)

Mutual labels: speech-synthesis, text-to-speech, tts

Glow Tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Stars: ✭ 284 (-12.35%)

Mutual labels: speech-synthesis, text-to-speech, tts

AdaSpeech

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (-66.67%)

Mutual labels: text-to-speech, tts, speech-synthesis

talkie

Text-to-speech browser extension button. Select text on any web page, and have the computer read it out loud for you by simply clicking the Talkie button.

Stars: ✭ 43 (-86.73%)

Mutual labels: text-to-speech, tts, speech-synthesis

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-79.63%)

Mutual labels: text-to-speech, tts, speech-synthesis

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (-54.01%)

Mutual labels: text-to-speech, tts, speech-synthesis

TensorVox

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (-56.79%)

Mutual labels: text-to-speech, tts, speech-synthesis

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-89.81%)

Mutual labels: text-to-speech, tts, speech-synthesis

LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (-79.32%)

Mutual labels: text-to-speech, tts, speech-synthesis

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-93.21%)

Mutual labels: text-to-speech, tts, speech-synthesis

View All Similar Projects ➔

Multilingual Speech Synthesis

Interactive synthesis demo
Website with samples
Paper & Description

This repository provides synthesized samples, training and evaluation data, source code, and parameters for the paper One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech.

It contains an implementation of Tacotron 2 that supports multilingual experiments and that implements different approaches to encoder parameter sharing. It presents a model combining ideas from Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning, End-to-End Code-Switched TTS with Mix of Monolingual Recordings, and Contextual Parameter Generation for Universal Neural Machine Translation.

We provide data for comparison of three multilingual text-to-speech models. The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language. Finally, the third is our attempt to combine the best of both previous approaches, i.e., effective parameter sharing of the first method and flexibility of the second. It has a fully convolutional encoder with language-specific parameters generated by a parameter generator. It also makes use of an adversarial speaker classifier which follows principles of domain adversarial training. See the illustration above.

Interactive demos introducing code-switching abilities and joint multilingual training of the generated model (trained on an enhanced CSS10 dataset) are available here and here, respectively.

Many samples synthesized using the three compared models are at this website. It contains also a few samples synthesized by a monolingual vanilla Tacotron trained on LJ Speech with the Griffin-Lim vocoder (a sanity check of our implementation).

Our best model supporting code-switching or voice-cloning can be downloaded here and the best model trained on the whole CSS10 dataset without the ambition to do voice-cloning is available here.

Running

We are now going to show how to run training of our multilingual Tacotron. We used a vocoder that is based on the WaveRNN model, see this repository for more details, or use our pre-trained model.

Clone repository

git clone https://github.com/Tomiinek/Multilingual_Text_to_Speech.git
cd Multilingual_Text_to_Speech

👀 Install python requirements

pip3 install -r requirements.txt

⌛️ Download datasets

Download the CSS10 dataset (Apache License 2.0) and our cleaned Common Voice data (Creative Commons CC0).

cd /project_root/data/css10

Visit the CSS10 repository and download data for all languages. Extract the downloaded archives. For example, in the case of French, you should see the following folder structure:

data/css10/french/lesmis/
data/css10/french/lupincontresholme/
data/css10/french/transcript.txt

Next, download our cleaned Common Voice dataset:

cd /project_root/data/comvoi_clean

wget https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/comvoi.zip
unzip -q comvoi.zip -d clean_comvoi
rm comvoi.zip

📜 Prepare spectrograms

This repository provides cleaned transcripts and meta-files and you have already downloaded corresponding .wav files. However, it is handy to precompute spectrograms (it speeds up training). In view of that, you can run an ad-hoc script that will create mel and linear spectrograms for you:

cd /project_root/data/
python3 prepare_css_spectrograms.py

You can create the meta-file, spectrograms, and phonemicized transcripts for other datasets by applying the TextToSpeechDataset.create_meta_file method to the original downloaded and extracted data (like LJ Speech, M-AILABs, etc., see dataset/loaders.py for supported datasets). Note that it is then needed to split the meta-file into train.txt and val.txt files.

🚅 Train

Now, we can run training. See the params/params.py file with an exhaustive description of parameters. The params folder also contains prepared parameter configurations (such as generated_switching.json) for multilingual training on the whole CSS10 dataset and for training of code-switching models on the dataset that consists of Cleaned Common Voice and five languages of CSS10.

Train with predefined configurations (recommended for quick start), for example:

PYTHONIOENCODING=utf-8 python3 train.py --hyper_parameters generated_switching

Please note the missing extension (.json).

Or with default parameters (default dataset is LJ Speech):

PYTHONIOENCODING=utf-8 python3 train.py

By default, training logs are saved into the logs directory. Use Tensorboard to monitor training:

tensorboard --logdir logs --port 6666 &

🏁 Checkpointing

Checkpoints are saved into the checkpoints directory by default. They contain model weights, parameters, the optimizer state, and the state of the scheduler. To restore training from a checkpoint, let's say named checkpoints/CHECKPOINT-1, run:

PYTHONIOENCODING=utf-8 python3 train.py --checkpoint CHECKPOINT-1

Inference

For generating spectrograms, see synthesize.py or interactive Colab notebooks (here and here). An example call that uses a checkpoint checkpoints/CHECKPOINT-1 and that saves both the synthesized spectrogram and the corresponding waveform vocoded using Griffin-Lim algorithm:

echo "01|Dies ist ein Beispieltext.|00-fr|de" | python3 synthesize.py --checkpoint checkpoints/CHECKPOINT-1 --save_spec

Vocoding

We used the WaveRNN model for vocoding. You can download WaveRNN weights pre-trained on the whole CSS10 dataset. For examples of usage, visit our interactive demos (here and here) or this repository.

Code Structure

Please, see this file for more details about the contained source-code and its structure.

🎓 Citation

@inproceedings{Nekvinda2020,
  author={Tomáš Nekvinda and Ondřej Dušek},
  title={{One Model, Many Languages: Meta-Learning for Multilingual Text-to-Speech}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={2972--2976},
  doi={10.21437/Interspeech.2020-2679},
  url={http://dx.doi.org/10.21437/Interspeech.2020-2679}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 324

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗