All Projects → andi611 → Zerospeech Tts Without T

andi611 / Zerospeech Tts Without T

Licence: mit
A Pytorch implementation for the ZeroSpeech 2019 challenge.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Zerospeech Tts Without T

spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-48%)
Mutual labels:  text-to-speech, tts, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-47%)
Mutual labels:  text-to-speech, tts, asr
Gpnd
Generative Probabilistic Novelty Detection with Adversarial Autoencoders
Stars: ✭ 112 (+12%)
Mutual labels:  gan, autoencoder, adversarial-learning
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (+225%)
Mutual labels:  gan, text-to-speech, tts
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+582%)
Mutual labels:  text-to-speech, tts
Ad examples
A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.
Stars: ✭ 641 (+541%)
Mutual labels:  gan, autoencoder
Lggan
[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
Stars: ✭ 97 (-3%)
Mutual labels:  gan, adversarial-learning
Jsut Lab
HTS-style full-context labels for JSUT v1.1
Stars: ✭ 28 (-72%)
Mutual labels:  text-to-speech, tts
Generative Models
Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN
Stars: ✭ 438 (+338%)
Mutual labels:  gan, autoencoder
Zhrtvc
Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统,包含语音编码器、语音合成器、声码器和可视化模块。
Stars: ✭ 771 (+671%)
Mutual labels:  text-to-speech, tts
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-69%)
Mutual labels:  text-to-speech, tts
Transformertts
🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
Stars: ✭ 617 (+517%)
Mutual labels:  text-to-speech, tts
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+442%)
Mutual labels:  asr, tts
Neurec
Next RecSys Library
Stars: ✭ 731 (+631%)
Mutual labels:  autoencoder, adversarial-learning
Melgan
MelGAN vocoder (compatible with NVIDIA/tacotron2)
Stars: ✭ 444 (+344%)
Mutual labels:  gan, tts
Advanced Deep Learning With Keras
Advanced Deep Learning with Keras, published by Packt
Stars: ✭ 917 (+817%)
Mutual labels:  gan, autoencoder
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-48%)
Mutual labels:  text-to-speech, tts
Wsay
Windows "say"
Stars: ✭ 36 (-64%)
Mutual labels:  text-to-speech, tts
Speaker
A PHP library to convert text to speech using various web services
Stars: ✭ 86 (-14%)
Mutual labels:  text-to-speech, tts
Gtts
Python library and CLI tool to interface with Google Translate's text-to-speech API
Stars: ✭ 1,303 (+1203%)
Mutual labels:  text-to-speech, tts

ZeroSpeech 2019: TTS without T - Pytorch

Quick Start

Setup

  • Clone this repo: git clone [email protected]:andi611/ZeroSpeech-TTS-without-T.git
  • CD into this repo: cd ZeroSpeech-TTS-without-T

Installing dependencies

  1. Install Python 3.

  2. Install the latest version of Pytorch according to your platform. For better performance, install with GPU support (CUDA) if viable. This code works with Pytorch 0.4 and later.

Prepare data

  1. Download the ZeroSpeech dataset.

    • The English dataset:
    wget https://download.zerospeech.com/2019/english.tgz
    tar xvfz english.tgz -C data
    rm -f english.tgz
    
    • The Surprise dataset:
    wget https://download.zerospeech.com/2019/surprise.zip
    # Go to https://download.zerospeech.com  and accept the licence agreement 
    # to get the password protecting the archive
    unzip surprise.zip -d data
    rm -f surprise.zip
    
  2. After unpacking the dataset into ~/ZeroSpeech-TTS-without-T/data, data tree should look like this:

     |- ZeroSpeech-TTS-without-T
    	 |- data
    		 |- english
    			 |- train
    			 	|- unit
    			 	|- voice
    			 |- test
    		|- surprise
    			 |- train
    			 	|- unit
    			 	|- voice
    			 |- test
    
  3. Preprocess the dataset and sample model-ready index files:

    python3 main.py --preprocess —-remake
    

Usage

Training

  1. Train ASR-TTS autoencoder model for discrete linguistic units discovery:

    python3 main.py --train_ae
    

    Tunable hyperparameters can be found in hps/zerospeech.json. You can adjust these parameters and setting by editing the file, the default hyperparameters are recommended for this project.

  2. Train TTS patcher for voice conversion performance boosting:

    python3 main.py --train_p --load_model --load_train_model_name=model.pth-ae-400000
    
  3. Train TTS patcher with target guided adversarial training:

    python3 main.py --train_tgat --load_model --load_train_model_name=model.pth-ae-400000
    
  4. Monitor with Tensorboard (OPTIONAL)

    tensorboard --logdir='path to log dir'
    or
    python3 -m tensorboard.main --logdir='path to log dir'
    

Testing

  1. Test on a single speech::

    python3 main.py --test_single --load_test_model_name=model.pth-ae-200000
    
  2. Test on 'synthesis.txt' and generate resynthesized audio files::

    python3 main.py --test --load_test_model_name=model.pth-ae-200000
    
  3. Test on all the testing speech under test/ and generate encoding files::

    python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000
    
  4. Add --enc_only if testing with ASR-TTS autoencoder only:

    python3 main.py --test_single --load_test_model_name=model.pth-ae-200000 --enc_only
    python3 main.py --test --load_test_model_name=model.pth-ae-200000 --enc_only
    python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000 --enc_only
    

Switching between datasets

  1. Simply use --dataset=surprise to switch to the default alternative set, all paths are handled automatically if the data tree structure is placed as suggested. For example:
    python3 main.py --train_ae --dataset=surprise
    

Trained-Models

  1. We provide trained models as ckpt files, Donwload Link: bit.ly/ZeroSpeech2019-Liu
  2. Reload model for training:
    --load_train_model_name=model.pth-ae-400000-128-multi-1024-english
    
    (--ckpt_dir=./ckpt_english or --ckpt_dir=./ckpt_surprise by default).
  3. 2 ways to load model for testing:
    --load_test_model_name=model.pth-ae-400000-128-multi-1024-english (by name)
    --ckpt_pth=ckpt/model.pth-ae-400000-128-multi-1024-english (direct path)
    
  4. Care that hps/zerospeech.json needs to be set accordingly to the model you are loading. If a 128-multi-1024 model is being loaded, seg_len and enc_size should be set to 128 and 1024, respectively. If a ae model is being loaded, the argument --enc_only must be used when running main.py (See 4. in the Testing section).

Notes

  • This code includes all the settings and methods we've tested for this challenge, some of which did not suceess but we did not remove them from our code. However, the previous instructions and default settings are for the method we proposed. By running them one can easily reproduce our results.
  • TODO: upload pre-trained models

Citation

@article{Liu_2019,
   title={Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion},
   url={http://dx.doi.org/10.21437/interspeech.2019-2048},
   DOI={10.21437/interspeech.2019-2048},
   journal={Interspeech 2019},
   publisher={ISCA},
   author={Liu, Andy T. and Hsu, Po-chun and Lee, Hung-Yi},
   year={2019},
   month={Sep}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].