xcmyz / Fastspeech
Licence: mit
The Implementation of FastSpeech based on pytorch.
Stars: ✭ 600
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Fastspeech
Parakeet
PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)
Stars: ✭ 279 (-53.5%)
Mutual labels: speech-synthesis
Libfaceid
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
Stars: ✭ 354 (-41%)
Mutual labels: speech-synthesis
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (-18.33%)
Mutual labels: speech-synthesis
Pysptk
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (-50.5%)
Mutual labels: speech-synthesis
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (-46%)
Mutual labels: speech-synthesis
Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-96.33%)
Mutual labels: speech-synthesis
Flowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Stars: ✭ 546 (-9%)
Mutual labels: speech-synthesis
Espeak
eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
Stars: ✭ 339 (-43.5%)
Mutual labels: speech-synthesis
Autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Stars: ✭ 485 (-19.17%)
Mutual labels: speech-synthesis
Nnmnkwii
Library to build speech synthesis systems designed for easy and fast prototyping.
Stars: ✭ 308 (-48.67%)
Mutual labels: speech-synthesis
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (-45.83%)
Mutual labels: speech-synthesis
Glow Tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Stars: ✭ 284 (-52.67%)
Mutual labels: speech-synthesis
Termit
Translations with speech synthesis in your terminal as a ruby gem
Stars: ✭ 505 (-15.83%)
Mutual labels: speech-synthesis
Pytorchwavenetvocoder
WaveNet-Vocoder implementation with pytorch.
Stars: ✭ 269 (-55.17%)
Mutual labels: speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-39.67%)
Mutual labels: speech-synthesis
Melgan Neurips
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
Stars: ✭ 592 (-1.33%)
Mutual labels: speech-synthesis
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (-9.67%)
Mutual labels: speech-synthesis
Gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Stars: ✭ 460 (-23.33%)
Mutual labels: speech-synthesis
FastSpeech-Pytorch
The Implementation of FastSpeech Based on Pytorch.
Update (2020/07/20)
- Optimize the training process.
- Optimize the implementation of length regulator.
- Use the same hyper parameter as FastSpeech2.
- The measures of the 1, 2 and 3 make the training process 3 times faster than before.
- Better speech quality.
Model
My Blog
Prepare Dataset
- Download and extract LJSpeech dataset.
- Put LJSpeech dataset in
data
. - Unzip
alignments.zip
. - Put Nvidia pretrained waveglow model in the
waveglow/pretrained_model
and rename aswaveglow_256channels.pt
; - Run
python3 preprocess.py
.
Training
Run python3 train.py
.
Evaluation
Run python3 eval.py
.
Notes
- In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
- I use the same hyper-parameter as FastSpeech2.
- The examples of audio are in
sample
. - pretrained model.
Reference
Repository
- The Implementation of Tacotron Based on Tensorflow
- The Implementation of Transformer Based on Pytorch
- The Implementation of Transformer-TTS Based on Pytorch
- The Implementation of Tacotron2 Based on Pytorch
- The Implementation of FastSpeech2 Based on Pytorch
Paper
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].