All Projects → NTT123 → vietTTS

NTT123 / vietTTS

Licence: MIT license
Vietnamese Text to Speech library

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to vietTTS

Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-71.79%)
Mutual labels:  text-to-speech, tacotron, hifi-gan
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+6857.69%)
Mutual labels:  text-to-speech, vocoder, tacotron
FFTNet
FFTNet: a Real-Time Speaker-Dependent Neural Vocoder
Stars: ✭ 63 (-19.23%)
Mutual labels:  text-to-speech, vocoder
LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (-14.1%)
Mutual labels:  text-to-speech, vocoder
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-6.41%)
Mutual labels:  text-to-speech, vocoder
FastSpeech2
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Stars: ✭ 163 (+108.97%)
Mutual labels:  text-to-speech, tts-engines
TTS tf
WIP Tensorflow implementation of https://github.com/mozilla/TTS
Stars: ✭ 14 (-82.05%)
Mutual labels:  text-to-speech, tacotron
Wavernn
WaveRNN Vocoder + TTS
Stars: ✭ 1,636 (+1997.44%)
Mutual labels:  text-to-speech, tacotron
Tacotron Pytorch
A Pytorch Implementation of Tacotron: End-to-end Text-to-speech Deep-Learning Model
Stars: ✭ 104 (+33.33%)
Mutual labels:  text-to-speech, tacotron
Tacotron2
A PyTorch implementation of Tacotron2, an end-to-end text-to-speech(TTS) system described in "Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions".
Stars: ✭ 43 (-44.87%)
Mutual labels:  text-to-speech, tacotron
SpeakIt Vietnamese TTS
Vietnamese Text-to-Speech on Windows Project (zalo-speech)
Stars: ✭ 81 (+3.85%)
Mutual labels:  text-to-speech, vietnamese
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+2423.08%)
Mutual labels:  text-to-speech, tacotron
melgan
MelGAN implementation with Multi-Band and Full Band supports...
Stars: ✭ 54 (-30.77%)
Mutual labels:  text-to-speech, vocoder
Tacotron2-PyTorch
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
Stars: ✭ 118 (+51.28%)
Mutual labels:  text-to-speech, tacotron
Tts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Stars: ✭ 305 (+291.03%)
Mutual labels:  text-to-speech, tacotron
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+2953.85%)
Mutual labels:  text-to-speech, vocoder
Tacotron Pytorch
Pytorch implementation of Tacotron
Stars: ✭ 189 (+142.31%)
Mutual labels:  text-to-speech, tacotron
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-32.05%)
Mutual labels:  text-to-speech
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+1956.41%)
Mutual labels:  text-to-speech
hawking
The retro text-to-speech bot for Discord
Stars: ✭ 24 (-69.23%)
Mutual labels:  text-to-speech

A Vietnamese TTS

Duration model + Acoustic model + HiFiGAN vocoder for vietnamese text-to-speech application.

Online demo at https://huggingface.co/spaces/ntt123/vietTTS.

A synthesized audio clip: clip.wav. A colab notebook: notebook.

🔔Checkout the experimental multi-speaker branch (git checkout multi-speaker) for multi-speaker support.🔔

Install

git clone https://github.com/NTT123/vietTTS.git
cd vietTTS 
pip3 install -e .

Quick start using pretrained models

bash ./scripts/quick_start.sh

Download InfoRe dataset

python ./scripts/download_aligned_infore_dataset.py

Note: this is a denoised and aligned version of the original dataset which is donated by the InfoRe Technology company (see here). You can download the original dataset (InfoRe Technology 1) at here.

See notebooks/denoise_infore_dataset.ipynb for instructions on how to denoise the dataset. We use the Montreal Forced Aligner (MFA) to align transcript and speech (textgrid files). See notebooks/align_text_audio_infore_mfa.ipynb for instructions on how to create textgrid files.

Train duration model

python -m vietTTS.nat.duration_trainer

Train acoustic model

python -m vietTTS.nat.acoustic_trainer

Train HiFiGAN vocoder

We use the original implementation from HiFiGAN authors at https://github.com/jik876/hifi-gan. Use the config file at assets/hifigan/config.json to train your model.

git clone https://github.com/jik876/hifi-gan.git

# create dataset in hifi-gan format
ln -sf `pwd`/train_data hifi-gan/data
cd hifi-gan/data
ls -1 *.TextGrid | sed -e 's/\.TextGrid$//' > files.txt
cd ..
head -n 100 data/files.txt > val_files.txt
tail -n +101 data/files.txt > train_files.txt
rm data/files.txt

# training
python train.py \
  --config ../assets/hifigan/config.json \
  --input_wavs_dir=data \
  --input_training_file=train_files.txt \
  --input_validation_file=val_files.txt

Finetune on Ground-Truth Aligned melspectrograms:

cd /path/to/vietTTS # go to vietTTS directory
python -m vietTTS.nat.zero_silence_segments -o train_data # zero all [sil, sp, spn] segments
python -m vietTTS.nat.gta -o /path/to/hifi-gan/ft_dataset  # create gta melspectrograms at hifi-gan/ft_dataset directory

# turn on finetune
cd /path/to/hifi-gan
python train.py \
  --fine_tuning True \
  --config ../assets/hifigan/config.json \
  --input_wavs_dir=data \
  --input_training_file=train_files.txt \
  --input_validation_file=val_files.txt

Then, use the following command to convert pytorch model to haiku format:

cd ..
python -m vietTTS.hifigan.convert_torch_model_to_haiku \
  --config-file=assets/hifigan/config.json \
  --checkpoint-file=hifi-gan/cp_hifigan/g_[latest_checkpoint]

Synthesize speech

python -m vietTTS.synthesizer \
  --lexicon-file=train_data/lexicon.txt \
  --text="hôm qua em tới trường" \
  --output=clip.wav
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].