Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → neosapience → editts

neosapience / editts

Licence: other

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Programming Languages

139335 projects - #7 most used programming language

566 projects

Labels

text-to-speech speech pytorch tts speech-synthesis speech-edit

Projects that are alternatives of or similar to editts

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-55.41%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-58.11%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+389.19%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Official implementation of Meta-StyleSpeech and StyleSpeech

Stars: ✭ 161 (+117.57%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.

Stars: ✭ 295 (+298.65%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Windows "say"

Stars: ✭ 36 (-51.35%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

AdaSpeech: Adaptive Text to Speech for Custom Voice

Stars: ✭ 108 (+45.95%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-29.73%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.

Stars: ✭ 245 (+231.08%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (+50%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+113.51%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (-1.35%)

Mutual labels: text-to-speech, speech, tts, speech-synthesis

Expressive-FastSpeech2

PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.

Stars: ✭ 139 (+87.84%)

Mutual labels: text-to-speech, tts, speech-synthesis

Parallel-Tacotron2

PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Stars: ✭ 149 (+101.35%)

Mutual labels: text-to-speech, tts, speech-synthesis

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Stars: ✭ 1,604 (+2067.57%)

Mutual labels: text-to-speech, tts, speech-synthesis

Desktop application for neural speech synthesis written in C++

Stars: ✭ 140 (+89.19%)

Mutual labels: text-to-speech, tts, speech-synthesis

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (+44.59%)

Mutual labels: text-to-speech, tts, speech-synthesis

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (-9.46%)

Mutual labels: text-to-speech, tts, speech-synthesis

TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Stars: ✭ 65 (-12.16%)

Mutual labels: speech, tts, speech-synthesis

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Stars: ✭ 66 (-10.81%)

Mutual labels: text-to-speech, tts, speech-synthesis

View All Similar Projects ➔

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Audio samples are available on our demo page.

Abstract

We present EdiTTS, an off-the-shelf speech editing methodology based on score-based generative modeling for text-to-speech synthesis. EdiTTS allows for targeted, granular editing of audio, both in terms of content and pitch, without the need for any additional training, task-specific optimization, or architectural modifications to the score-based model backbone. Specifically, we apply coarse yet deliberate perturbations in the Gaussian prior space to induce desired behavior from the diffusion model, while applying masks and softening kernels to ensure that iterative edits are applied only to the target region. Listening tests demonstrate that EdiTTS is capable of reliably generating natural-sounding audio that satisfies user-imposed requirements.

Citation

Please cite this work as follows.

@misc{tae&kim2021editts,
      title={EdiTTS: Score-based Editing for Controllable Text-to-Speech}, 
      author={Jaesung Tae and Hyeongju Kim and Taesu Kim},
      year={2021}
}

Setup

Create a Python virtual environment (venv or conda) and install package requirements as specified in requirements.txt.
```
python -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt
```

Build the monotonic alignment module.

cd model/monotonic_align
python setup.py build_ext --inplace

For more information, refer to the official repository of Grad-TTS.

Checkpoints

The following checkpoints are already included as part of this repository, under checkpts.

Pitch Shifting

Prepare an input file containing samples for speech generation. Mark the segment to be edited via a vertical bar separator, |. For instance, a single sample might look like

In | the face of impediments confessedly discouraging |

We provide a sample input file in resources/filelists/edit_pitch_example.txt.

To run inference, type

CUDA_VISIBLE_DEVICES=0 python edit_pitch.py \
    -f resources/filelists/edit_pitch_example.txt \
    -c checkpts/grad-tts-old.pt -t 1000 \
    -s out/pitch/wavs

Adjust CUDA_VISIBLE_DEVICES as appropriate.

Content Replacement

Prepare an input file containing pairs of sentences. Concatenate each pair with # and mark the parts to be replaced with a vertical bar separator. For instance, a single pair might look like

Three others subsequently | identified | Oswald from a photograph. #Three others subsequently | recognized | Oswald from a photograph.

We provide a sample input file in resources/filelists/edit_content_example.txt.

To run inference, type

CUDA_VISIBLE_DEVICES=0 python edit_content.py \
    -f resources/filelists/edit_content_example.txt \
    -c checkpts/grad-tts-old.pt -t 1000 \
    -s out/content/wavs

References

License

Released under the modified GNU General Public License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 74

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗