Alternatives and detailed information of One-Shot-Voice-Cloning

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021

Stars: ✭ 105 (-11.02%)

Mutual labels: tts, style-transfer

Tensorflowtts

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1918.64%)

Mutual labels: tts, voice-cloning

voices

macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.

Stars: ✭ 53 (-55.08%)

Mutual labels: tts

Shakespearizing-Modern-English

Code for "Jhamtani H.*, Gangal V.*, Hovy E. and Nyberg E. Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models" Workshop on Stylistic Variation, EMNLP 2017

Stars: ✭ 64 (-45.76%)

Mutual labels: style-transfer

Wasserstein2GenerativeNetworks

PyTorch implementation of "Wasserstein-2 Generative Networks" (ICLR 2021)

Stars: ✭ 38 (-67.8%)

Mutual labels: style-transfer

deep-learning-german-tts

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Stars: ✭ 268 (+127.12%)

Mutual labels: tts

Android-Tensorflow-Style-Transfer

Based on tensorflow's style transfer Android project.

Stars: ✭ 18 (-84.75%)

Mutual labels: style-transfer

Image recoloring

Image Recoloring Based on Object Color Distributions (Eurographics 2019)

Stars: ✭ 30 (-74.58%)

Mutual labels: style-transfer

lewis

Official code for LEWIS, from: "LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer", ACL-IJCNLP 2021 Findings by Machel Reid and Victor Zhong

Stars: ✭ 22 (-81.36%)

Mutual labels: style-transfer

golang-tts

Text-to-Speach golang package based in Amazon Polly service

Stars: ✭ 19 (-83.9%)

Mutual labels: tts

linguistic-style-transfer-pytorch

Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch

Stars: ✭ 55 (-53.39%)

Mutual labels: style-transfer

SpeakIt Vietnamese TTS

Vietnamese Text-to-Speech on Windows Project (zalo-speech)

Stars: ✭ 81 (-31.36%)

Mutual labels: tts

a-neural-algorithm-of-artistic-style

Keras implementation of "A Neural Algorithm of Artistic Style"

Stars: ✭ 110 (-6.78%)

Mutual labels: style-transfer

VisualML

Interactive Visual Machine Learning Demos.

Stars: ✭ 104 (-11.86%)

Mutual labels: style-transfer

ttskit

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Stars: ✭ 336 (+184.75%)

Mutual labels: tts

WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Stars: ✭ 55 (-53.39%)

Mutual labels: tts

totalvoice-node

Client em NodeJS para API da Totalvoice

Stars: ✭ 54 (-54.24%)

Mutual labels: tts

View All Similar Projects ➔

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

English | 中文

❗ Now we provide inferencing code and pre-training models. You could generate any text sounds you want.

⭐ The model training only uses the corpus of neutral emotion, and does not use any strongly emotional speech.

⭐ There are still great challenges in out-of-domain style transfer. Limited by the training corpus, it is difficult for the speaker-embedding or unsupervised style learning (like GST) methods to imitate the unseen data.

⭐ With the help of Unet network and AdaIN layer, our proposed algorithm has powerful speaker and style transfer capabilities.

Demo results

Paper link

✨Colab notebook is Highly Recommended for test.

⭐ Now, you only need to use the reference speech for one-shot voice cloning and no longer need to manually enter the duration statistics additionally.

😄 The authors are preparing simple, clear, and well-documented training process of Unet-TTS based on Aishell3.

It contains:

One-shot Voice cloning inference
The duration statistics of the reference speech can be estimated Automatically using Style_Encoder.
Multi-speaker TTS with speaker_embedding-Instance-Normalization, and this model provides pre-training Content Encoder.
Unet-TTS training
C++ inference

Stay tuned!

Install Requirements

Only support Linux system
Install the appropriate TensorFlow and tensorflow-addons versions according to CUDA version.
The default is TensorFlow 2.6 and tensorflow-addons 0.14.0.

cd One-Shot-Voice-Cloning/TensorFlowTTS
pip install . 
(or python setup.py install)

Usage

Option 1: Modify the reference audio file to be cloned in the UnetTTS_syn.py file. (See this file for more details)

cd One-Shot-Voice-Cloning
CUDA_VISIBLE_DEVICES=0 python UnetTTS_syn.py

Option 2: Notebook

Note: Please add the One-Shot-Voice-Cloning path to the system path. Otherwise the required class UnetTTS cannot be imported from the UnetTTS_syn.py file.

import sys
sys.path.append("<your repository's parent directory>/One-Shot-Voice-Cloning")
from UnetTTS_syn import UnetTTS

from tensorflow_tts.audio_process import preprocess_wav

"""Inint models"""
models_and_params = {"duration_param": "train/configs/unetts_duration.yaml",
                    "duration_model": "models/duration4k.h5",
                    "acous_param": "train/configs/unetts_acous.yaml",
                    "acous_model": "models/acous12k.h5",
                    "vocoder_param": "train/configs/multiband_melgan.yaml",
                    "vocoder_model": "models/vocoder800k.h5"}

feats_yaml = "train/configs/unetts_preprocess.yaml"

text2id_mapper = "models/unetts_mapper.json"

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)

"""Synthesize arbitrary text cloning voice using a reference speech""" 
wav_fpath = "./reference_speech.wav"
ref_audio = preprocess_wav(wav_fpath, source_sr=16000, normalize=True, trim_silence=True, is_sil_pad=True,
                    vad_window_length=30,
                    vad_moving_average_width=1,
                    vad_max_silence_length=1)

# Inserting #3 marks into text is regarded as punctuation, and synthetic speech can produce pause.
text = "一句话#3风格迁移#3语音合成系统"

syn_audio, _, _ = Tts_handel.one_shot_TTS(text, ref_audio)

Reference

https://github.com/TensorSpeech/TensorFlowTTS

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

CMsmartvoice / One-Shot-Voice-Cloning

Programming Languages

Labels

Projects that are alternatives of or similar to One-Shot-Voice-Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Install Requirements

Usage

Reference