All Projects → CMsmartvoice → One-Shot-Voice-Cloning

CMsmartvoice / One-Shot-Voice-Cloning

Licence: other
☺️ One Shot Voice Cloning base on Unet-TTS

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to One-Shot-Voice-Cloning

Text-to-Speech-Landscape
No description or website provided.
Stars: ✭ 31 (-73.73%)
Mutual labels:  tts, style-transfer, voice-cloning
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+612.71%)
Mutual labels:  tts, voice-cloning
Real Time Voice Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Stars: ✭ 32,095 (+27099.15%)
Mutual labels:  tts, voice-cloning
STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021
Stars: ✭ 105 (-11.02%)
Mutual labels:  tts, style-transfer
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1918.64%)
Mutual labels:  tts, voice-cloning
voices
macOS CLI for changing the default TTS (text-to-speech) voice and printing information about and speaking text with multiple voices.
Stars: ✭ 53 (-55.08%)
Mutual labels:  tts
Shakespearizing-Modern-English
Code for "Jhamtani H.*, Gangal V.*, Hovy E. and Nyberg E. Shakespearizing Modern Language Using Copy-Enriched Sequence to Sequence Models" Workshop on Stylistic Variation, EMNLP 2017
Stars: ✭ 64 (-45.76%)
Mutual labels:  style-transfer
Wasserstein2GenerativeNetworks
PyTorch implementation of "Wasserstein-2 Generative Networks" (ICLR 2021)
Stars: ✭ 38 (-67.8%)
Mutual labels:  style-transfer
deep-learning-german-tts
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
Stars: ✭ 268 (+127.12%)
Mutual labels:  tts
Android-Tensorflow-Style-Transfer
Based on tensorflow's style transfer Android project.
Stars: ✭ 18 (-84.75%)
Mutual labels:  style-transfer
Image recoloring
Image Recoloring Based on Object Color Distributions (Eurographics 2019)
Stars: ✭ 30 (-74.58%)
Mutual labels:  style-transfer
lewis
Official code for LEWIS, from: "LEWIS: Levenshtein Editing for Unsupervised Text Style Transfer", ACL-IJCNLP 2021 Findings by Machel Reid and Victor Zhong
Stars: ✭ 22 (-81.36%)
Mutual labels:  style-transfer
golang-tts
Text-to-Speach golang package based in Amazon Polly service
Stars: ✭ 19 (-83.9%)
Mutual labels:  tts
linguistic-style-transfer-pytorch
Implementation of "Disentangled Representation Learning for Non-Parallel Text Style Transfer(ACL 2019)" in Pytorch
Stars: ✭ 55 (-53.39%)
Mutual labels:  style-transfer
SpeakIt Vietnamese TTS
Vietnamese Text-to-Speech on Windows Project (zalo-speech)
Stars: ✭ 81 (-31.36%)
Mutual labels:  tts
a-neural-algorithm-of-artistic-style
Keras implementation of "A Neural Algorithm of Artistic Style"
Stars: ✭ 110 (-6.78%)
Mutual labels:  style-transfer
VisualML
Interactive Visual Machine Learning Demos.
Stars: ✭ 104 (-11.86%)
Mutual labels:  style-transfer
ttskit
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
Stars: ✭ 336 (+184.75%)
Mutual labels:  tts
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-53.39%)
Mutual labels:  tts
totalvoice-node
Client em NodeJS para API da Totalvoice
Stars: ✭ 54 (-54.24%)
Mutual labels:  tts

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

MIT License

English | 中文

Now we provide inferencing code and pre-training models. You could generate any text sounds you want.

The model training only uses the corpus of neutral emotion, and does not use any strongly emotional speech.

There are still great challenges in out-of-domain style transfer. Limited by the training corpus, it is difficult for the speaker-embedding or unsupervised style learning (like GST) methods to imitate the unseen data.

With the help of Unet network and AdaIN layer, our proposed algorithm has powerful speaker and style transfer capabilities.

Demo results

Paper link

Colab notebook is Highly Recommended for test.


Now, you only need to use the reference speech for one-shot voice cloning and no longer need to manually enter the duration statistics additionally.

😄 The authors are preparing simple, clear, and well-documented training process of Unet-TTS based on Aishell3.

It contains:

  • One-shot Voice cloning inference
  • The duration statistics of the reference speech can be estimated Automatically using Style_Encoder.
  • Multi-speaker TTS with speaker_embedding-Instance-Normalization, and this model provides pre-training Content Encoder.
  • Unet-TTS training
  • C++ inference

Stay tuned!


Install Requirements

  • Only support Linux system
  • Install the appropriate TensorFlow and tensorflow-addons versions according to CUDA version.
  • The default is TensorFlow 2.6 and tensorflow-addons 0.14.0.
cd One-Shot-Voice-Cloning/TensorFlowTTS
pip install . 
(or python setup.py install)

Usage

Option 1: Modify the reference audio file to be cloned in the UnetTTS_syn.py file. (See this file for more details)

cd One-Shot-Voice-Cloning
CUDA_VISIBLE_DEVICES=0 python UnetTTS_syn.py

Option 2: Notebook

Note: Please add the One-Shot-Voice-Cloning path to the system path. Otherwise the required class UnetTTS cannot be imported from the UnetTTS_syn.py file.

import sys
sys.path.append("<your repository's parent directory>/One-Shot-Voice-Cloning")
from UnetTTS_syn import UnetTTS

from tensorflow_tts.audio_process import preprocess_wav

"""Inint models"""
models_and_params = {"duration_param": "train/configs/unetts_duration.yaml",
                    "duration_model": "models/duration4k.h5",
                    "acous_param": "train/configs/unetts_acous.yaml",
                    "acous_model": "models/acous12k.h5",
                    "vocoder_param": "train/configs/multiband_melgan.yaml",
                    "vocoder_model": "models/vocoder800k.h5"}

feats_yaml = "train/configs/unetts_preprocess.yaml"

text2id_mapper = "models/unetts_mapper.json"

Tts_handel = UnetTTS(models_and_params, text2id_mapper, feats_yaml)

"""Synthesize arbitrary text cloning voice using a reference speech""" 
wav_fpath = "./reference_speech.wav"
ref_audio = preprocess_wav(wav_fpath, source_sr=16000, normalize=True, trim_silence=True, is_sil_pad=True,
                    vad_window_length=30,
                    vad_moving_average_width=1,
                    vad_max_silence_length=1)

# Inserting #3 marks into text is regarded as punctuation, and synthetic speech can produce pause.
text = "一句话#3风格迁移#3语音合成系统"

syn_audio, _, _ = Tts_handel.one_shot_TTS(text, ref_audio)

Reference

https://github.com/TensorSpeech/TensorFlowTTS

https://github.com/CorentinJ/Real-Time-Voice-Cloning

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].