All Projects → kan-bayashi → Pytorchwavenetvocoder

kan-bayashi / Pytorchwavenetvocoder

Licence: apache-2.0
WaveNet-Vocoder implementation with pytorch.

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to Pytorchwavenetvocoder

ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (-41.26%)
Mutual labels:  speech-synthesis, wavenet
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+615.99%)
Mutual labels:  speech-synthesis, wavenet
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+153.53%)
Mutual labels:  speech-synthesis, wavenet
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+631.6%)
Mutual labels:  speech-synthesis, wavenet
Tf Wavenet vocoder
Wavenet and its applications with Tensorflow
Stars: ✭ 58 (-78.44%)
Mutual labels:  speech-synthesis, wavenet
QPPWG
Quasi-Periodic Parallel WaveGAN Pytorch implementation
Stars: ✭ 41 (-84.76%)
Mutual labels:  speech-synthesis, wavenet
LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (-75.09%)
Mutual labels:  speech-synthesis
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+3082.16%)
Mutual labels:  speech-synthesis
Parallel-Tacotron2
PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Stars: ✭ 149 (-44.61%)
Mutual labels:  speech-synthesis
Neural-HMM
Neural HMMs are all you need (for high-quality attention-free TTS)
Stars: ✭ 69 (-74.35%)
Mutual labels:  speech-synthesis
hifigan-denoiser
HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (-67.29%)
Mutual labels:  wavenet
EmotionalConversionStarGAN
This repository contains code to replicate results from the ICASSP 2020 paper "StarGAN for Emotional Speech Conversion: Validated by Data Augmentation of End-to-End Emotion Recognition".
Stars: ✭ 92 (-65.8%)
Mutual labels:  speech-synthesis
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-80.67%)
Mutual labels:  speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (-72.49%)
Mutual labels:  speech-synthesis
MelNet-SpeechGeneration
Implementation of MelNet in PyTorch to generate high-fidelity audio samples
Stars: ✭ 19 (-92.94%)
Mutual labels:  speech-synthesis
esp32-flite
Speech synthesis running on ESP32 based on Flite engine.
Stars: ✭ 28 (-89.59%)
Mutual labels:  speech-synthesis
mimic2
Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Stars: ✭ 537 (+99.63%)
Mutual labels:  speech-synthesis
Meta-TTS
Official repository of https://arxiv.org/abs/2111.04040v1
Stars: ✭ 69 (-74.35%)
Mutual labels:  speech-synthesis
Tacotron pytorch
Tacotron implementation of pytorch
Stars: ✭ 12 (-95.54%)
Mutual labels:  speech-synthesis
porfir
Голосовой ассистент Порфирьевич
Stars: ✭ 23 (-91.45%)
Mutual labels:  speech-synthesis

I released new implementation kan-bayashi/ParallelWaveGAN. Please enjoy your hacking!

PYTORCH-WAVENET-VOCODER

Build Status

This repository is the wavenet-vocoder implementation with pytorch.

You can try the demo recipe in Google colab from now!

Open In Colab

Key features

  • Support kaldi-like recipe, easy to reproduce the results

  • Support multi-gpu training / decoding

  • Support world features / mel-spectrogram as auxiliary features

  • Support recipes of three public databases

Requirements

  • python 3.6+
  • virtualenv
  • cuda 9.0+
  • cndnn 7.1+
  • nccl 2.0+ (for the use of multi-gpus)

Recommend to use the GPU with 10GB> memory.

Setup

A. Make virtualenv

$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder/tools
$ make

B. Install with pip

$ git clone https://github.com/kan-bayashi/PytorchWaveNetVocoder.git
$ cd PytorchWaveNetVocoder

# recommend to use with pytorch 1.0.1 because only tested on 1.0.1
$ pip install torch==1.0.1 torchvision==0.2.2
$ pip install -e .

# please make dummy activate file to suppress warning in the recipe
$ mkdir -p tools/venv/bin && touch tools/venv/bin/activate

How-to-run

$ cd egs/arctic/sd
$ ./run.sh

See more detail of the recipes in egs/README.md.

Results

You can listen to samples from kan-bayashi/WaveNetVocoderSamples.

This is the subjective evaluation results using arctic recipe.

Comparison between model type

Effect of the amount of training data

If you want to listen more samples, please access our google drive from here.

Here is the list of samples:

  • arctic_raw_16k: original in arctic database
  • arctic_sd_16k_world: sd model with world aux feats + noise shaping with world mcep
  • arctic_si-open_16k_world: si-open model with world aux feats + noise shaping with world mcep
  • arctic_si-close_16k_world: si-close model with world aux feats + noise shaping with world mcep
  • arctic_si-close_16k_melspc: si-close model with mel-spectrogram aux feats
  • arctic_si-close_16k_melspc_ns: si-close model with mel-spectrogram aux feats + noise shaping with stft mcep
  • ljspeech_raw_22.05k: original in ljspeech database
  • ljspeech_sd_22.05k_world: sd model with world aux feats + noise shaping with world mcep
  • ljspeech_sd_22.05k_melspc: sd model with mel-spectrogram aux feats
  • ljspeech_sd_22.05k_melspc_ns: sd model with mel-spectrogram aux feats + noise shaping with stft mcep
  • m-ailabs_raw_16k: original in m-ailabs speech database
  • m-ailabs_sd_16k_melspc: sd model with mel-spectrogram aux feats

References

Please cite the following articles.

@inproceedings{tamamori2017speaker,
  title={Speaker-dependent WaveNet vocoder},
  author={Tamamori, Akira and Hayashi, Tomoki and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proceedings of Interspeech},
  pages={1118--1122},
  year={2017}
}
@inproceedings{hayashi2017multi,
  title={An Investigation of Multi-Speaker Training for WaveNet Vocoder},
  author={Hayashi, Tomoki and Tamamori, Akira and Kobayashi, Kazuhiro and Takeda, Kazuya and Toda, Tomoki},
  booktitle={Proc. ASRU 2017},
  year={2017}
}
@article{hayashi2018sp,
  title={複数話者WaveNetボコーダに関する調査}.
  author={林知樹 and 小林和弘 and 玉森聡 and 武田一哉 and 戸田智基},
  journal={電子情報通信学会技術研究報告},
  year={2018}
}

Author

Tomoki Hayashi @ Nagoya University
e-mail:[email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].