Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DeepNude's algorithm and general image generation theory and practice research, including pix2pix, CycleGAN, UGATIT, DCGAN, SinGAN, ALAE, mGANprior, StarGAN-v2 and VAE models (TensorFlow2 implementation). DeepNude的算法以及通用生成对抗网络（GAN,Generative Adversarial Network）图像生成的理论与实践研究。

Stars: ✭ 4,029 (+19085.71%)

Mutual labels: vae

Tensorflow Mnist Vae

Tensorflow implementation of variational auto-encoder for MNIST

Stars: ✭ 422 (+1909.52%)

Mutual labels: vae

Beta Vae

Pytorch implementation of β-VAE

Stars: ✭ 326 (+1452.38%)

Mutual labels: vae

Advanced Deep Learning With Keras

Advanced Deep Learning with Keras, published by Packt

Stars: ✭ 917 (+4266.67%)

Mutual labels: vae

Disentangling Vae

Experiments for understanding disentanglement in VAE latent representations

Stars: ✭ 398 (+1795.24%)

Mutual labels: vae

Tensorflow Vae Gan Draw

A collection of generative methods implemented with TensorFlow (Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoder (VAE) and DRAW: A Recurrent Neural Network For Image Generation).

Stars: ✭ 577 (+2647.62%)

Mutual labels: vae

Tensorflow Generative Model Collections

Collection of generative models in Tensorflow

Stars: ✭ 3,785 (+17923.81%)

Mutual labels: vae

Pytorch Vqvae

Vector Quantized VAEs - PyTorch Implementation

Stars: ✭ 396 (+1785.71%)

Mutual labels: vae

Generative Models

Annotated, understandable, and visually interpretable PyTorch implementations of: VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN

Stars: ✭ 438 (+1985.71%)

Mutual labels: vae

Dsprites Dataset

Dataset to assess the disentanglement properties of unsupervised learning methods

Stars: ✭ 340 (+1519.05%)

Mutual labels: vae

Generative Models

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Stars: ✭ 6,701 (+31809.52%)

Mutual labels: vae

Pytorch rvae

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

Stars: ✭ 332 (+1480.95%)

Mutual labels: vae

Awesome Vaes

A curated list of awesome work on VAEs, disentanglement, representation learning, and generative models.

Stars: ✭ 418 (+1890.48%)

Mutual labels: vae

Variational Autoencoder

PyTorch implementation of "Auto-Encoding Variational Bayes"

Stars: ✭ 25 (+19.05%)

Mutual labels: vae

Variational Autoencoder

Variational autoencoder implemented in tensorflow and pytorch (including inverse autoregressive flow)

Stars: ✭ 807 (+3742.86%)

Mutual labels: vae

Sentence Vae

PyTorch Re-Implementation of "Generating Sentences from a Continuous Space" by Bowman et al 2015 https://arxiv.org/abs/1511.06349

Stars: ✭ 462 (+2100%)

Mutual labels: vae

View All Similar Projects ➔

Voice Conversion on unaligned data

Voice Conversion (VC) is widely desirable across many industries and applications, including speaker anonymisation, film dubbing, gaming, and voice restoration for people who have lost their ability to speak. In this work we compare standard VAE, VQ-VAE and Gumbel VAE models as approaches to VC on the Voice Conversion Challenge 2016 dataset. We assess speech reconstruction and VC performance on both spectral frames as obtained from a WORLD vocoder and on the raw waveform data.

The full report and evaluation results can be found here.

How to train your VC model

1. Preprocess data

Place the raw VCC2016 dataset in data/vcc2016_raw/vcc2016_training.zip (raw audio features) or data/vcc2016/vcc2016_training.zip (WORLD features), or VCTK dataset in data/vctk/VCTK-Corpus.zip.

To generate the preprocessed data files run one of the following:

VCC2016 Raw data

python preprocessing.py --dataset=VCCRaw2016 --trim_silence=True

VCC2016 WORLD data

python preprocessing.py --dataset=VCCWORLD2016 --trim_silence=True

VCTK Raw data

python preprocessing.py --dataset=VCTK --trim_silence=True --shuffle_order=True --split_samples=True

2. Train model

Run the following on a compute cluster to train the model.

python train_vqvae.py \ # Or train_vae.py, train_joint_vae.py
            --use_gpu=True \ # Whether to use GPU
            --gpu_id='0,1' \ # GPU ids to use
            --filepath_to_arguments_json_file="experiment_configs/config_file.json" \ # Model and experiment configuration file
            --dataset_root_path='data'

How to evaluate or perform VC

In order to evaluate the model you first have to create and load the trained model. Then you have to prepare your audio data using WORLD or mu-law preprocessing, as well as padding/trimming such that the input to the model is of the correct length. After conversion, you have to postprocess to produce the audio file. An example is given in this section.

1. Load configuration

from util.arg_extractor import extract_args_from_json

args = extract_args_from_json('experiment_configs/config_file.json')

2. Create model

VQVAE

from models.vqvae import VQVAE

model = VQVAE(
    input_shape=(1, 1, args.input_len),
    encoder_arch=args.encoder,
    vq_arch=args.vq,
    generator_arch=args.generator,
    num_speakers=args.num_speakers,
    speaker_dim=args.speaker_dim,
    use_gated_convolutions=args.use_gated_convolutions)

VAE

from models.vae import VAE

model = VAE(
    input_shape=(1, 1, args.input_len),
    encoder_arch=args.encoder,
    generator_arch=args.generator,
    latent_dim=args.latent_dim,
    num_speakers=args.num_speakers,
    speaker_dim=args.speaker_dim,
    use_gated_convolutions=args.use_gated_convolutions)

JointVAE

from models.joint_vae import JointVAE

model = JointVAE(
    input_shape=(1, 1, args.input_len),
    encoder_arch=args.encoder,
    generator_arch=args.generator,
    latent_dim=args.latent_dim,
    num_latents=args.num_latents,
    temperature=args.temperature,
    num_speakers=args.num_speakers,
    speaker_dim=args.speaker_dim,
    use_gated_convolutions=args.use_gated_convolutions)

3. Load model weights

To load the model weights we us the same experiment builders as used in training.

from experiment_builders.vqvae_builder import VQVAERawExperimentBuilder

# To load the model weights use VQVAERawExperimentBuilder, VQVAEWORLDExperimentBuilder, VAERawExperimentBuilder, VAEWORLDExperimentBuilder, JointVAERawExperimentBuilder, or JointVAEWORLDExperimentBuilder depending on the experiment
builder = VQVAERawExperimentBuilder(network_model=model,
                                    experiment_name=args.experiment_name,
                                    num_epochs=args.num_epochs,
                                    weight_decay_coefficient=args.weight_decay_coefficient,
                                    learning_rate=args.learning_rate,
                                    commit_coefficient=args.commit_coefficient, # This argument is only needed in VQVAE experiment builders
                                    device=torch.device('cpu'),
                                    continue_from_epoch=epoch, # Epoch of the model to load (should be your best validation model)
                                    train_data=None,
                                    val_data=None)

4. Perform conversion

Raw data feature experiments

import torchaudio
import util.torchaudio_transforms as transforms
from datasets.vcc_preprocessor import read_audio # Or import from vctk_preprocessor respectively

# Prepare mu-law encoding transformers
mulaw = transforms.MuLawEncoding(quantization_channels=args.num_input_quantization_channels)
mulaw_expanding = transforms.MuLawExpanding(quantization_channels=args.num_input_quantization_channels)

# Load audio
audio_path = os.path.expanduser(audio_path)
torchaudio.initialize_sox()
audio, sr = read_audio(audio_path, trim_silence=True)
torchaudio.shutdown_sox()

# Prepare an audio piece of appropriate length, e.g. as follows
audio = audio.unsqueeze(0)
audio_len = audio.shape[-1]
padding = transforms.PadTrim(math.ceil(audio.shape[-1] / args.input_len) * args.input_len)
audio = padding(audio.squeeze(0)).unsqueeze(0)
audio_split = audio.view(int(audio.shape[-1] / args.input_len), 1, args.input_len)

# Set target speaker id
target_speaker_id = torch.tensor(target_speaker_id, dtype=torch.long)

# Voice conversion
out_mulaw = builder.convert(x=mulaw(audio_split), y=target_speaker_id)

# Postprocess
out = mulaw_expanding(out_mulaw).detach().view(1, -1)
out = out[:, :audio_len]

# Save as audio file
torchaudio.save(filepath=out_file_path, src=out, sample_rate=sr)

WORLD feature experiments

from data.vcc_world_dataset import VCCWORLDDataset
from datasets.vcc_world_preprocessor import read_audio_and_extract_features, synthesize_from_WORLD_features

# Load audio
audio_path = os.path.expanduser(audio_path)
spectra, aperiodicity, f0, energy = read_audio_and_extract_features(audio_path)

# Set target speaker id
target_speaker_id = torch.tensor(target_speaker_id, dtype=torch.long)

# Voice conversion
dataset = VCCWORLDDataset('data', scale=True)
spectra_scaled = dataset.scale_spectra(torch.tensor(spectra)).unsqueeze(1)
spectra_out = builder.convert(x=spectra_scaled, y=speaker_id)
spectra_out = dataset.scale_spectra_back(spectra_out)
f0_converted = dataset.convert_f0(torch.tensor(f0), source_speaker_id, args.eval_speaker_id)
spectra_out = spectra_out.squeeze(1)
# Synthesize audio
audio_out = synthesize_from_WORLD_features(f0_converted.numpy(), spectra_out.numpy(), aperiodicity, energy)
audio_out = np.clip(audio_out, a_min=-0.9, a_max=0.9)

# Save as audio
torchaudio.save(filepath=out_file_path, src=torch.tensor(audio_out.copy()), sample_rate=16000)

Models

In our evaluation we have investigated three different VAE models.

VAE, implemented in models/vae.py.
VQVAE, implemented in models/vqvae.py and models/vq_functions.py.
JointVAE, implemented in models/joint_vae.py.

Software dependencies

PyTorch v1.0.0 or later
numpy
pillow
tqdm
pyworld (for extracting WORLD features)
torchaudio (for preprocessing raw audio)

VCTK dataset modifications

The VCTK dataset has some silent files, hence the following audio samples were removed

p323_424, p306_151, p351_361, p345_292, p341_101, p306_352.

Contributors

Vaidotas Simkus
Simon Valentin
Will Greedy

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 21

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗