Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.

Stars: ✭ 641 (+541%)

Mutual labels: gan, autoencoder

Lggan

[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation

Stars: ✭ 97 (-3%)

Mutual labels: gan, adversarial-learning

Jsut Lab

HTS-style full-context labels for JSUT v1.1

Stars: ✭ 28 (-72%)

Mutual labels: text-to-speech, tts

Generative Models

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Stars: ✭ 438 (+338%)

Mutual labels: gan, autoencoder

Zhrtvc

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

Stars: ✭ 771 (+671%)

Mutual labels: text-to-speech, tts

Lightspeech

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Stars: ✭ 31 (-69%)

Mutual labels: text-to-speech, tts

Transformertts

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.

Stars: ✭ 617 (+517%)

Mutual labels: text-to-speech, tts

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (+442%)

Mutual labels: asr, tts

Neurec

Next RecSys Library

Stars: ✭ 731 (+631%)

Mutual labels: autoencoder, adversarial-learning

Melgan

MelGAN vocoder (compatible with NVIDIA/tacotron2)

Stars: ✭ 444 (+344%)

Mutual labels: gan, tts

Advanced Deep Learning With Keras

Advanced Deep Learning with Keras, published by Packt

Stars: ✭ 917 (+817%)

Mutual labels: gan, autoencoder

Cs224n Gpu That Talks

Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)

Stars: ✭ 52 (-48%)

Mutual labels: text-to-speech, tts

Wsay

Windows "say"

Stars: ✭ 36 (-64%)

Mutual labels: text-to-speech, tts

Speaker

A PHP library to convert text to speech using various web services

Stars: ✭ 86 (-14%)

Mutual labels: text-to-speech, tts

Gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

Stars: ✭ 1,303 (+1203%)

Mutual labels: text-to-speech, tts

View All Similar Projects ➔

ZeroSpeech 2019: TTS without T - Pytorch

This is the original source code for the paper "Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion", which is accepted by Interspeech 2019.
Furthurmore, we used this implementation to compete in the ZeroSpeech 2019 challenge. On the Surprise dataset leaderboard, the proposed method is 2^nd place in terms of low bitrate, while achieving higher Mean Opinion Score (MOS) and lower CER than the 1^st place team.
Feel free to use or modify them, any bug report or improvement suggestion will be appreciated. If you have any questions, please contact [email protected]. If you find this project helpful for your research, please do consider to cite this paper, thanks!

Quick Start

Setup

Clone this repo: git clone [email protected]:andi611/ZeroSpeech-TTS-without-T.git
CD into this repo: cd ZeroSpeech-TTS-without-T

Installing dependencies

Install Python 3.
Install the latest version of Pytorch according to your platform. For better performance, install with GPU support (CUDA) if viable. This code works with Pytorch 0.4 and later.

Prepare data

Download the ZeroSpeech dataset.

The English dataset:

wget https://download.zerospeech.com/2019/english.tgz
tar xvfz english.tgz -C data
rm -f english.tgz

The Surprise dataset:

wget https://download.zerospeech.com/2019/surprise.zip
# Go to https://download.zerospeech.com  and accept the licence agreement 
# to get the password protecting the archive
unzip surprise.zip -d data
rm -f surprise.zip

After unpacking the dataset into ~/ZeroSpeech-TTS-without-T/data, data tree should look like this:

 |- ZeroSpeech-TTS-without-T
	 |- data
		 |- english
			 |- train
			 	|- unit
			 	|- voice
			 |- test
		|- surprise
			 |- train
			 	|- unit
			 	|- voice
			 |- test

Preprocess the dataset and sample model-ready index files:
```
python3 main.py --preprocess —-remake
```

Usage

Training

Train ASR-TTS autoencoder model for discrete linguistic units discovery:
```
python3 main.py --train_ae
```
Tunable hyperparameters can be found in hps/zerospeech.json. You can adjust these parameters and setting by editing the file, the default hyperparameters are recommended for this project.

Train TTS patcher for voice conversion performance boosting:

python3 main.py --train_p --load_model --load_train_model_name=model.pth-ae-400000

Train TTS patcher with target guided adversarial training:

python3 main.py --train_tgat --load_model --load_train_model_name=model.pth-ae-400000

Monitor with Tensorboard (OPTIONAL)

tensorboard --logdir='path to log dir'
or
python3 -m tensorboard.main --logdir='path to log dir'

Testing

Test on a single speech::

python3 main.py --test_single --load_test_model_name=model.pth-ae-200000

Test on 'synthesis.txt' and generate resynthesized audio files::

python3 main.py --test --load_test_model_name=model.pth-ae-200000

Test on all the testing speech under test/ and generate encoding files::

python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000

Add --enc_only if testing with ASR-TTS autoencoder only:

python3 main.py --test_single --load_test_model_name=model.pth-ae-200000 --enc_only
python3 main.py --test --load_test_model_name=model.pth-ae-200000 --enc_only
python3 main.py --test_encode --load_test_model_name=model.pth-ae-200000 --enc_only

Switching between datasets

Simply use --dataset=surprise to switch to the default alternative set, all paths are handled automatically if the data tree structure is placed as suggested. For example:
```
python3 main.py --train_ae --dataset=surprise
```

Trained-Models

We provide trained models as ckpt files, Donwload Link: bit.ly/ZeroSpeech2019-Liu
Reload model for training:
```
--load_train_model_name=model.pth-ae-400000-128-multi-1024-english
```
(--ckpt_dir=./ckpt_english or --ckpt_dir=./ckpt_surprise by default).

2 ways to load model for testing:

--load_test_model_name=model.pth-ae-400000-128-multi-1024-english (by name)
--ckpt_pth=ckpt/model.pth-ae-400000-128-multi-1024-english (direct path)

Care that hps/zerospeech.json needs to be set accordingly to the model you are loading. If a 128-multi-1024 model is being loaded, seg_len and enc_size should be set to 128 and 1024, respectively. If a ae model is being loaded, the argument --enc_only must be used when running main.py (See 4. in the Testing section).

Notes

This code includes all the settings and methods we've tested for this challenge, some of which did not suceess but we did not remove them from our code. However, the previous instructions and default settings are for the method we proposed. By running them one can easily reproduce our results.
TODO: upload pre-trained models

Citation

@article{Liu_2019,
   title={Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion},
   url={http://dx.doi.org/10.21437/interspeech.2019-2048},
   DOI={10.21437/interspeech.2019-2048},
   journal={Interspeech 2019},
   publisher={ISCA},
   author={Liu, Andy T. and Hsu, Po-chun and Lee, Hung-Yi},
   year={2019},
   month={Sep}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 100

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗