rishikksh20 / FastSpeech2

Licence: Apache-2.0 license

PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech

Programming Languages

Jupyter Notebook

11667 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FastSpeech2

Tensorflowtts

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1361.35%)

Mutual labels: text-to-speech, tts, fastspeech, fastspeech2

AdaSpeech

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (-33.74%)

Mutual labels: text-to-speech, tts, fastspeech, fastspeech2

FastSpeech2

Multi-Speaker Pytorch FastSpeech2: Fast and High-Quality End-to-End Text to Speech ✊

Stars: ✭ 64 (-60.74%)

Mutual labels: text-to-speech, tts, fastspeech2

TensorVox

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (-14.11%)

Mutual labels: text-to-speech, tts, fastspeech2

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (-8.59%)

Mutual labels: text-to-speech, tts, fastspeech

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-74.85%)

Mutual labels: text-to-speech, tts

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-34.36%)

Mutual labels: text-to-speech, tts

golang-tts

Text-to-Speach golang package based in Amazon Polly service

Stars: ✭ 19 (-88.34%)

Mutual labels: text-to-speech, tts

ukrainian-tts

Ukrainian TTS (text-to-speech) using Coqui TTS

Stars: ✭ 74 (-54.6%)

Mutual labels: text-to-speech, tts

JSpeak

A Text to Speech Reader Front-end that Reads from the Clipboard and with Exceptionable Features

Stars: ✭ 16 (-90.18%)

Mutual labels: text-to-speech, tts

voices

macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.

Stars: ✭ 53 (-67.48%)

Mutual labels: text-to-speech, tts

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-79.75%)

Mutual labels: text-to-speech, tts

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+415.95%)

Mutual labels: text-to-speech, tts

SpeakIt Vietnamese TTS

Vietnamese Text-to-Speech on Windows Project (zalo-speech)

Stars: ✭ 81 (-50.31%)

Mutual labels: text-to-speech, tts

vietTTS

Vietnamese Text to Speech library

Stars: ✭ 78 (-52.15%)

Mutual labels: text-to-speech, tts-engines

StyleSpeech

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (-1.23%)

Mutual labels: text-to-speech, tts

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-59.51%)

Mutual labels: text-to-speech, tts

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-66.26%)

Mutual labels: text-to-speech, tts

soundpad-text-to-speech

Text-To-Speech for Soundpad

Stars: ✭ 29 (-82.21%)

Mutual labels: text-to-speech, tts

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Stars: ✭ 139 (-14.72%)

Mutual labels: text-to-speech, tts

View All Similar Projects ➔

Fastspeech 2

UnOfficial PyTorch implementation of FastSpeech 2: Fast and High-Quality End-to-End Text to Speech. This repo uses the FastSpeech implementation of Espnet as a base. In this implementation I tried to replicate the exact paper details but still some modification required for better model, this repo open for any suggestion and improvement. This repo uses Nvidia's tacotron 2 preprocessing for audio pre-processing and MelGAN as vocoder.

Demo :

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python .\nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python .\compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

For training

 python train_fastspeech.py --outdir etc -c configs/default.yaml -n "name"

For inference

Currently only phonemes based Synthesis supported.

python .\inference.py -c .\configs\default.yaml -p .\checkpoints\first_1\ts_version2_fastspeech_fe9a2c7_7k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

For TorchScript Export

python export_torchscript.py -c configs/default.yaml -n fastspeech_scrip --outdir etc

Checkpoint and samples:

Checkpoint find here
For samples check sample folder.

Tensorboard

Training :

Validation :

Note

Coding of this repo is roughly done just to re-produce the paper and experimentation purpose. Needed a code cleanup and opyimization for better use.
Currently this repo produces good quality audio but still it is in WIP, many improvement needed.
Loss curve for F0 is quite high.
I am using raw F0 and energy for train a model, but we can also use normalize F0 and energy for stable training.
Using Postnet for better audio quality.
For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox ⚡ please visit Deepsync Technologies.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

rishikksh20 / FastSpeech2

Programming Languages

Labels

Projects that are alternatives of or similar to FastSpeech2

Fastspeech 2

Demo :

Requirements :

For Preprocessing :

For training

For inference

For TorchScript Export

Checkpoint and samples:

Tensorboard

Note

References