KevinMIN95 / StyleSpeech

Licence: MIT license

Official implementation of Meta-StyleSpeech and StyleSpeech

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to StyleSpeech

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+52.17%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

IMS-Toucan

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+83.23%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-65.84%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

AdaSpeech

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (-32.92%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-86.34%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Daft-Exprt

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Stars: ✭ 41 (-74.53%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (-1.86%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (-7.45%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

editts

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (-54.04%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (-54.66%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

VAENAR-TTS

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-59.01%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Lightspeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-80.75%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-33.54%)

Mutual labels: text-to-speech, tts, speech-synthesis, neural-tts

Durian

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (-31.06%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Zero-Shot-TTS

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-79.5%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-67.7%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Voice Builder

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+124.84%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Wsay

Windows "say"

Stars: ✭ 36 (-77.64%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Wavernn

WaveRNN Vocoder + TTS

Stars: ✭ 1,636 (+916.15%)

Mutual labels: text-to-speech, tts, speech-synthesis

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (-36.02%)

Mutual labels: text-to-speech, tts, speech-synthesis

View All Similar Projects ➔

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Recent Updates

[12/18/2021] ✨ Thanks Guan-Ting Lin for sharing the pre-trained multi-speaker MelGAN vocoder in 16kHz, and the checkpoint is now available in Pre-trained 16k-MelGAN. For the usage details, please follow the instructions in MelGAN.

[06/09/2021] Few modifications on the Variance Adaptor wich were found to improve the quality of the model . 1) We replace the architecture of variance emdedding from one Conv1D layer to two Conv1D layers followed by a linear layer. 2) We add a layernorm and phoneme-wise positional encoding. Please refer to here.

Introduction

This is an official code for our recent paper. We propose Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. We provide our implementation and pretrained models as open source in this repository.

Abstract : With rapid progress in neural text-to-speech (TTS) models, personalized speech generation is now in high demand for many applications. For practical applicability, a TTS model should generate high-quality speech with only a few audio samples from the given speaker, that are also short in length. However, existing methods either require to fine-tune the model or achieve low adaptation quality without fine-tuning. In this work, we propose StyleSpeech, a new TTS model which not only synthesizes high-quality speech but also effectively adapts to new speakers. Specifically, we propose Style-Adaptive Layer Normalization (SALN) which aligns gain and bias of the text input according to the style extracted from a reference speech audio. With SALN, our model effectively synthesizes speech in the style of the target speaker even from single speech audio. Furthermore, to enhance StyleSpeech's adaptation to speech from new speakers, we extend it to Meta-StyleSpeech by introducing two discriminators trained with style prototypes, and performing episodic training. The experimental results show that our models generate high-quality speech which accurately follows the speaker's voice with single short-duration (1-3 sec) speech audio, significantly outperforming baselines.

Demo audio samples are avaliable demo page.

Getting the pretrained models

Model	Link to the model
Meta-StyleSpeech	Link
StyleSpeech	Link

Prerequisites

Clone this repository.
Install python requirements. Please refer requirements.txt

Inference

You have to download pretrained models and prepared an audio for reference speech sample.

python synthesize.py --text <raw text to synthesize> --ref_audio <path to referecne speech audio> --checkpoint_path <path to pretrained model>

The generated mel-spectrogram will be saved in results/ folder.

Preprocessing the dataset

Our models are trained on LibriTTS dataset. Download, extract and place it in the dataset/ folder.

To preprocess the dataset : First, run

python prepare_align.py

to resample audios to 16kHz and for some other preperations.

Second, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.

./montreal-forced-aligner/bin/mfa_align dataset/wav16/ lexicon/librispeech-lexicon.txt  english datset/TextGrid/ -j 10 -v

Third, preprocess the dataset to prepare mel-spectrogram, duration, pitch and energy for fast training.

python preprocess.py

Train!

Train the StyleSpeech from the scratch with

python train.py

Train the Meta-StyleSpeech from pretrained StyleSpeech with

python train_meta.py --checkpoint_path <path to pretrained StyleSpeech model>

Acknowledgements

We refered to

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

KevinMIN95 / StyleSpeech

Programming Languages

Labels

Projects that are alternatives of or similar to StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

Recent Updates

Introduction

Getting the pretrained models

Prerequisites

Inference

Preprocessing the dataset

Train!

Acknowledgements