All Projects → rishikksh20 → FastSpeech2

rishikksh20 / FastSpeech2

Licence: Apache-2.0 license
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FastSpeech2

Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1361.35%)
Mutual labels:  text-to-speech, tts, fastspeech, fastspeech2
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (-33.74%)
Mutual labels:  text-to-speech, tts, fastspeech, fastspeech2
FastSpeech2
Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊
Stars: ✭ 64 (-60.74%)
Mutual labels:  text-to-speech, tts, fastspeech2
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (-14.11%)
Mutual labels:  text-to-speech, tts, fastspeech2
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (-8.59%)
Mutual labels:  text-to-speech, tts, fastspeech
Daft-Exprt
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
Stars: ✭ 41 (-74.85%)
Mutual labels:  text-to-speech, tts
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (-34.36%)
Mutual labels:  text-to-speech, tts
golang-tts
Text-to-Speach golang package based in Amazon Polly service
Stars: ✭ 19 (-88.34%)
Mutual labels:  text-to-speech, tts
ukrainian-tts
Ukrainian TTS (text-to-speech) using Coqui TTS
Stars: ✭ 74 (-54.6%)
Mutual labels:  text-to-speech, tts
JSpeak
A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features
Stars: ✭ 16 (-90.18%)
Mutual labels:  text-to-speech, tts
voices
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Stars: ✭ 53 (-67.48%)
Mutual labels:  text-to-speech, tts
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-79.75%)
Mutual labels:  text-to-speech, tts
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+415.95%)
Mutual labels:  text-to-speech, tts
SpeakIt Vietnamese TTS
Vietnamese Text-to-Speech on Windows Project (zalo-speech)
Stars: ✭ 81 (-50.31%)
Mutual labels:  text-to-speech, tts
vietTTS
Vietnamese Text to Speech library
Stars: ✭ 78 (-52.15%)
Mutual labels:  text-to-speech, tts-engines
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (-1.23%)
Mutual labels:  text-to-speech, tts
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (-59.51%)
Mutual labels:  text-to-speech, tts
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-66.26%)
Mutual labels:  text-to-speech, tts
soundpad-text-to-speech
Text-To-Speech for Soundpad
Stars: ✭ 29 (-82.21%)
Mutual labels:  text-to-speech, tts
Expressive-FastSpeech2
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Stars: ✭ 139 (-14.72%)
Mutual labels:  text-to-speech, tts

Fastspeech 2

UnOfficial PyTorch implementation of FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. This repo uses the FastSpeech implementation of Espnet as a base. In this implementation I tried to replicate the exact paper details but still some modification required for better model, this repo open for any suggestion and improvement. This repo uses Nvidia's tacotron 2 preprocessing for audio pre-processing and MelGAN as vocoder.

Demo : Open In Colab

Requirements :

All code written in Python 3.6.2 .

  • Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

  • Installing other requirements :
pip install -r requirements.txt
  • To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python .\nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python .\compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

For training

 python train_fastspeech.py --outdir etc -c configs/default.yaml -n "name"

For inference

Open In Colab
Currently only phonemes based Synthesis supported.

python .\inference.py -c .\configs\default.yaml -p .\checkpoints\first_1\ts_version2_fastspeech_fe9a2c7_7k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

For TorchScript Export

python export_torchscript.py -c configs/default.yaml -n fastspeech_scrip --outdir etc

Checkpoint and samples:

  • Checkpoint find here
  • For samples check sample folder.

Tensorboard

Training :
Tensorboard
Validation :
Tensorboard

Note

  • Coding of this repo is roughly done just to re-produce the paper and experimentation purpose. Needed a code cleanup and opyimization for better use.
  • Currently this repo produces good quality audio but still it is in WIP, many improvement needed.
  • Loss curve for F0 is quite high.
  • I am using raw F0 and energy for train a model, but we can also use normalize F0 and energy for stable training.
  • Using Postnet for better audio quality.
  • For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox please visit Deepsync Technologies.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].