All Projects β†’ rishikksh20 β†’ AdaSpeech

rishikksh20 / AdaSpeech

Licence: Apache-2.0 license
AdaSpeech: Adaptive Text to Speech for Custom Voice

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to AdaSpeech

Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-69.44%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis, transformer
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+2105.56%)
Mutual labels:  text-to-speech, tts, speech-synthesis, fastspeech, fastspeech2
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+29.63%)
Mutual labels:  text-to-speech, tts, speech-synthesis, fastspeech2
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-32.41%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+49.07%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (+37.96%)
Mutual labels:  text-to-speech, tts, speech-synthesis, fastspeech
ttslearn
ttslearn: Library for Pythonで学ぢ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+46.3%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-31.48%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+235.19%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+173.15%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+126.85%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-66.67%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
FastSpeech2
PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech
Stars: ✭ 163 (+50.93%)
Mutual labels:  text-to-speech, tts, fastspeech, fastspeech2
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-51.85%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (+188.89%)
Mutual labels:  text-to-speech, tts, speech-synthesis, transformer
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-71.3%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (+2.78%)
Mutual labels:  text-to-speech, speech, tts, speech-synthesis
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+1473.15%)
Mutual labels:  text-to-speech, tts, speech-synthesis
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Stars: ✭ 1,604 (+1385.19%)
Mutual labels:  text-to-speech, tts, speech-synthesis
Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (+12.96%)
Mutual labels:  text-to-speech, tts, speech-synthesis

AdaSpeech: Adaptive Text to Speech for Custom Voice [WIP]

Unofficial Pytorch implementation of AdaSpeech.

Note:

  • I am not considering multi-speaker use case, Iam much more focus only on single speaker.
  • I will use only Utterance level encoder and Phoneme level encoder not condition layer norm (which is the soul of AdaSpeech paper), it definelty restrict the adaptive nature of AdaSpeech but my focus is to improve FastSpeech 2 acoustic generalization rather than adaptation.

Citations

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Requirements :

All code written in Python 3.6.2 .

  • Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

  • Installing other requirements :
pip install -r requirements.txt
  • To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

For training

 python train_fastspeech.py --outdir etc -c configs/default.yaml -n "name"

Note

  • For more complete and end to end Voice cloning or Text to Speech (TTS) toolbox please visit Deepsync Technologies.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].