All Projects → xcmyz → Fastspeech

xcmyz / Fastspeech

Licence: mit
The Implementation of FastSpeech based on pytorch.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Fastspeech

Parakeet
PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)
Stars: ✭ 279 (-53.5%)
Mutual labels:  speech-synthesis
Libfaceid
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
Stars: ✭ 354 (-41%)
Mutual labels:  speech-synthesis
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (-18.33%)
Mutual labels:  speech-synthesis
Pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (-50.5%)
Mutual labels:  speech-synthesis
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (-46%)
Mutual labels:  speech-synthesis
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+655.5%)
Mutual labels:  speech-synthesis
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-96.33%)
Mutual labels:  speech-synthesis
Flowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Stars: ✭ 546 (-9%)
Mutual labels:  speech-synthesis
Espeak
eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
Stars: ✭ 339 (-43.5%)
Mutual labels:  speech-synthesis
Autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Stars: ✭ 485 (-19.17%)
Mutual labels:  speech-synthesis
Nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
Stars: ✭ 308 (-48.67%)
Mutual labels:  speech-synthesis
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (-45.83%)
Mutual labels:  speech-synthesis
Sprocket
Voice Conversion Tool Kit
Stars: ✭ 425 (-29.17%)
Mutual labels:  speech-synthesis
Glow Tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Stars: ✭ 284 (-52.67%)
Mutual labels:  speech-synthesis
Termit
Translations with speech synthesis in your terminal as a ruby gem
Stars: ✭ 505 (-15.83%)
Mutual labels:  speech-synthesis
Pytorchwavenetvocoder
WaveNet-Vocoder implementation with pytorch.
Stars: ✭ 269 (-55.17%)
Mutual labels:  speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-39.67%)
Mutual labels:  speech-synthesis
Melgan Neurips
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
Stars: ✭ 592 (-1.33%)
Mutual labels:  speech-synthesis
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (-9.67%)
Mutual labels:  speech-synthesis
Gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Stars: ✭ 460 (-23.33%)
Mutual labels:  speech-synthesis

FastSpeech-Pytorch

The Implementation of FastSpeech Based on Pytorch.

Update (2020/07/20)

  1. Optimize the training process.
  2. Optimize the implementation of length regulator.
  3. Use the same hyper parameter as FastSpeech2.
  4. The measures of the 1, 2 and 3 make the training process 3 times faster than before.
  5. Better speech quality.

Model

My Blog

Prepare Dataset

  1. Download and extract LJSpeech dataset.
  2. Put LJSpeech dataset in data.
  3. Unzip alignments.zip.
  4. Put Nvidia pretrained waveglow model in the waveglow/pretrained_model and rename as waveglow_256channels.pt;
  5. Run python3 preprocess.py.

Training

Run python3 train.py.

Evaluation

Run python3 eval.py.

Notes

  • In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
  • I use the same hyper-parameter as FastSpeech2.
  • The examples of audio are in sample.
  • pretrained model.

Reference

Repository

Paper

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].