Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → xcmyz → Fastspeech

xcmyz / Fastspeech

Licence: mit

The Implementation of FastSpeech based on pytorch.

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-learning pytorch speech-synthesis

Projects that are alternatives of or similar to Fastspeech

PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)

Stars: ✭ 279 (-53.5%)

Mutual labels: speech-synthesis

libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.

Stars: ✭ 354 (-41%)

Mutual labels: speech-synthesis

Java Speech Api

The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.

Stars: ✭ 490 (-18.33%)

Mutual labels: speech-synthesis

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Stars: ✭ 297 (-50.5%)

Mutual labels: speech-synthesis

Multilingual text to speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Stars: ✭ 324 (-46%)

Mutual labels: speech-synthesis

End-to-End Speech Processing Toolkit

Stars: ✭ 4,533 (+655.5%)

Mutual labels: speech-synthesis

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-96.33%)

Mutual labels: speech-synthesis

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

Stars: ✭ 546 (-9%)

Mutual labels: speech-synthesis

eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.

Stars: ✭ 339 (-43.5%)

Mutual labels: speech-synthesis

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Stars: ✭ 485 (-19.17%)

Mutual labels: speech-synthesis

Library to build speech synthesis systems designed for easy and fast prototyping.

Stars: ✭ 308 (-48.67%)

Mutual labels: speech-synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Stars: ✭ 325 (-45.83%)

Mutual labels: speech-synthesis

Voice Conversion Tool Kit

Stars: ✭ 425 (-29.17%)

Mutual labels: speech-synthesis

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Stars: ✭ 284 (-52.67%)

Mutual labels: speech-synthesis

Translations with speech synthesis in your terminal as a ruby gem

Stars: ✭ 505 (-15.83%)

Mutual labels: speech-synthesis

Pytorchwavenetvocoder

WaveNet-Vocoder implementation with pytorch.

Stars: ✭ 269 (-55.17%)

Mutual labels: speech-synthesis

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (-39.67%)

Mutual labels: speech-synthesis

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Stars: ✭ 592 (-1.33%)

Mutual labels: speech-synthesis

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (-9.67%)

Mutual labels: speech-synthesis

PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)

Stars: ✭ 460 (-23.33%)

Mutual labels: speech-synthesis

View All Similar Projects ➔

FastSpeech-Pytorch

The Implementation of FastSpeech Based on Pytorch.

Update (2020/07/20)

Optimize the training process.
Optimize the implementation of length regulator.
Use the same hyper parameter as FastSpeech2.
The measures of the 1, 2 and 3 make the training process 3 times faster than before.
Better speech quality.

Model

My Blog

Prepare Dataset

Download and extract LJSpeech dataset.
Put LJSpeech dataset in data.
Unzip alignments.zip.
Put Nvidia pretrained waveglow model in the waveglow/pretrained_model and rename as waveglow_256channels.pt;
Run python3 preprocess.py.

Training

Run python3 train.py.

Evaluation

Run python3 eval.py.

Notes

In the paper of FastSpeech, authors use pre-trained Transformer-TTS model to provide the target of alignment. I didn't have a well-trained Transformer-TTS model so I use Tacotron2 instead.
I use the same hyper-parameter as FastSpeech2.
The examples of audio are in sample.
pretrained model.

Reference

Repository

Paper

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 600

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗