All Projects → chaiyujin → dctts-pytorch

chaiyujin / dctts-pytorch

Licence: MIT license
The pytorch implementation of DC-TTS

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to dctts-pytorch

tts dataset maker
A gui to help make a text to speech dataset.
Stars: ✭ 20 (-72.6%)
Mutual labels:  text-to-speech, tts
voices
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Stars: ✭ 53 (-27.4%)
Mutual labels:  text-to-speech, tts
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+47.95%)
Mutual labels:  text-to-speech, tts
TTS tf
WIP Tensorflow implementation of https://github.com/mozilla/TTS
Stars: ✭ 14 (-80.82%)
Mutual labels:  text-to-speech, tts
SpeakIt Vietnamese TTS
Vietnamese Text-to-Speech on Windows Project (zalo-speech)
Stars: ✭ 81 (+10.96%)
Mutual labels:  text-to-speech, tts
EMPHASIS-pytorch
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Stars: ✭ 15 (-79.45%)
Mutual labels:  text-to-speech, tts
laravel-text-to-speech
💬 A wrapper for popular TTS services to create a more simple & uniform API. Currently, only AWS Polly is supported.
Stars: ✭ 26 (-64.38%)
Mutual labels:  text-to-speech, tts
FastSpeech2
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊
Stars: ✭ 64 (-12.33%)
Mutual labels:  text-to-speech, tts
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-24.66%)
Mutual labels:  text-to-speech, tts
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-43.84%)
Mutual labels:  text-to-speech, tts
Tacotron2-PyTorch
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
Stars: ✭ 118 (+61.64%)
Mutual labels:  text-to-speech, tts
FastSpeech2
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Stars: ✭ 163 (+123.29%)
Mutual labels:  text-to-speech, tts
JSpeak
A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features
Stars: ✭ 16 (-78.08%)
Mutual labels:  text-to-speech, tts
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-9.59%)
Mutual labels:  text-to-speech, tts
ukrainian-tts
Ukrainian TTS (text-to-speech) using Coqui TTS
Stars: ✭ 74 (+1.37%)
Mutual labels:  text-to-speech, tts
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+1052.05%)
Mutual labels:  text-to-speech, tts
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-54.79%)
Mutual labels:  text-to-speech, tts
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (+46.58%)
Mutual labels:  text-to-speech, tts
golang-tts
Text-to-Speach golang package based in Amazon Polly service
Stars: ✭ 19 (-73.97%)
Mutual labels:  text-to-speech, tts
STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Stars: ✭ 105 (+43.84%)
Mutual labels:  text-to-speech, tts

DC-TTS

The pytorch implementation of papar Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.

Thanks for Kyubyong/dc_tts, which helped me a lot to overcome some difficulties.

Dataset

  • The LJ Speech Dataset. A public domain speech dataset consisting of 13,100 short audio clips of a single female speaker.

Train

I have tuned hyper parameters and trained a model with The LJ Speech Dataset. The hyper parameters may not be the best and are slightly different with those used in original paper.

To train a model yourself with The LJ Speech Dataset:

  1. Download the dataset and extract into a directory, set the directory in pkg/hyper.py
  2. Run preprocess
    python3 main.py --action preprocess
    
  3. Train Text2Mel network, you can change the device to train text2mel in pkg/hyper.py
    python3 main.py --action train --module Text2Mel
    
  4. Train SSRN network, also, it's possible to change the training device
    python3 main.py --action train --module SuperRes
    

Samples

Some synthesized samples are contained in directory synthesis. The according sentences are listed in sentences.txt. The pre-trained model for Text2Mel and SuperRes (auto-saved at logdir/text2mel/pkg/trained.pkg and logdir/superres/pkg/trained.pkg in training phase) will be loaded when synthesizing.

You can synthesis samples listed in sentences.txt with

python3 main.py --action synthesis
  • Attention Matrix for the sentence: "Which came first... the chicken or the egg? Did the universe have a beginning... and if so, what happened before then? Where did the universe come from... and where is it going?"

Pre-trained model

The samples in directory synthesis is sampled with 410k batches trained Text2Mel and 190k batches trained SuperRes.

The current result is not very satisfying, specificly, some vowels are skipped. Hope someone can find better hyper parameters and train better models. Please tell me if you were able to get a great model.

You can download the current pre-trained model from my dropbox.

Dependancy

  • scipy, librosa, num2words
  • pytorch >= 0.4.0

Relative

TensorFlow implementation: Kyubyong/dc_tts

Please email me or open an issue, if you have any question or suggestion.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].