All Projects → huiw39 → ExtensibleTTS-PyTorch

huiw39 / ExtensibleTTS-PyTorch

Licence: other
An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ExtensibleTTS-PyTorch

few-shot-transformer-tts
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.
Stars: ✭ 60 (+140%)
Mutual labels:  speech-synthesis
deep-learning-german-tts
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Stars: ✭ 268 (+972%)
Mutual labels:  speech-synthesis
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (+8%)
Mutual labels:  speech-synthesis
ml-with-audio
HF's ML for Audio study group
Stars: ✭ 104 (+316%)
Mutual labels:  speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+332%)
Mutual labels:  speech-synthesis
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (+720%)
Mutual labels:  speech-synthesis
magphase
MagPhase Vocoder: Speech analysis/synthesis system for TTS and related applications.
Stars: ✭ 76 (+204%)
Mutual labels:  merlin
Sinsy-NG
(discontinued) 🎵The Formant-Based All Language Singing Voice Syntheis System: Sinsy-NG
Stars: ✭ 15 (-40%)
Mutual labels:  speech-synthesis
merlin-language-server
Minimal cross-platform LSP wrapper for merlin
Stars: ✭ 28 (+12%)
Mutual labels:  merlin
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (+120%)
Mutual labels:  speech-synthesis
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (+164%)
Mutual labels:  speech-synthesis
sova-tts-tps
NLP-preprocessor for the SOVA-TTS project
Stars: ✭ 44 (+76%)
Mutual labels:  speech-synthesis
melgan
MelGAN implementation with Multi-Band and Full Band supports...
Stars: ✭ 54 (+116%)
Mutual labels:  speech-synthesis
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (+100%)
Mutual labels:  speech-synthesis
TinyCog
Small Robot, Toy Robot platform
Stars: ✭ 29 (+16%)
Mutual labels:  speech-synthesis
speechrec
a simple speech recognition app using the Web Speech API Interfaces
Stars: ✭ 18 (-28%)
Mutual labels:  speech-synthesis
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+3264%)
Mutual labels:  speech-synthesis
klatt-syn
Klatt formant synthesizer
Stars: ✭ 18 (-28%)
Mutual labels:  speech-synthesis
ppg-vc
PPG-Based Voice Conversion
Stars: ✭ 154 (+516%)
Mutual labels:  speech-synthesis
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (+64%)
Mutual labels:  speech-synthesis

ExtensibleTTS-PyTorch

An extensible speech synthesis system, build with PyTorch and the original code is from r9y9's https://github.com/r9y9/nnmnkwii_gallery . You will find it easy to train acoustic model by employing popular models such as tacotron's encoder, deepvoice's encoder, transformer's encoder and any other you created.

Quick Start

Dependencies

Prepare Dataset

Note: the repo requires wav files with aligned HTS-style full-context lablel files.

  1. Download a dataset

    cmu_slt_arctic

  2. Unpack the dataset into ~/ExtensibleTTS-PyTorch/datasets

    After unpacking, your tree should look like this for cmu_slt_arctic:

    ExtensibleTTS-PyTorch   
      |- datasets    
          |- slt_arctic_full_data
              |- label_phone_align
              |- label_state_align
              |- wav
              |- file_id_list_full.scp
              |- questions-radio_dnn_416.hed
    

Training

  1. Preprocess the data to extract linguistic/duration/acoustic feature
python preprocess.py --label state_align
  • Use --label phone_align
  1. Count min/max/mean/var/scale value of the data for input/output feature normalization
python norm_params.py
  1. Train a model
python train_dnn.py --train_model duration
  • Use --train_model acoustic for training a acoustic model
  1. Label to speech waveform from a duration/acoustic checkpoint
python synthesis.py --label state_align --duration_checkpint * --acoustic_checkpint *
  1. Restore from a checkpoint
python train.py --restore_step *

WIP

  • combined with MTTS, the Mandarin frontend
  • batch inference for synthesis speedup
  • scheduled sampling
  • model pruning

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].