All Projects → tuan3w → Cnn_vocoder

tuan3w / Cnn_vocoder

Licence: mit
A fast cnn-based vocoder

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cnn vocoder

Comprehensive-Tacotron2
PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.
Stars: ✭ 22 (-70.27%)
Mutual labels:  tts, speech-synthesis
Hifi Gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Stars: ✭ 325 (+339.19%)
Mutual labels:  speech-synthesis, tts
Parakeet
PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)
Stars: ✭ 279 (+277.03%)
Mutual labels:  speech-synthesis, tts
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-1.35%)
Mutual labels:  tts, speech-synthesis
Wsay
Windows "say"
Stars: ✭ 36 (-51.35%)
Mutual labels:  speech-synthesis, tts
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (+0%)
Mutual labels:  tts, speech-synthesis
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (+321.62%)
Mutual labels:  speech-synthesis, tts
talkie
Text-to-speech browser extension button. Select text on any web page, and have the computer read it out loud for you by simply clicking the Talkie button.
Stars: ✭ 43 (-41.89%)
Mutual labels:  tts, speech-synthesis
Cs224n Gpu That Talks
Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)
Stars: ✭ 52 (-29.73%)
Mutual labels:  speech-synthesis, tts
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+389.19%)
Mutual labels:  speech-synthesis, tts
Jsut Lab
HTS-style full-context labels for JSUT v1.1
Stars: ✭ 28 (-62.16%)
Mutual labels:  speech-synthesis, tts
Parallelwavegan
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
Stars: ✭ 682 (+821.62%)
Mutual labels:  speech-synthesis, tts
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-29.73%)
Mutual labels:  tts, speech-synthesis
esp32-flite
Speech synthesis running on ESP32 based on Flite engine.
Stars: ✭ 28 (-62.16%)
Mutual labels:  tts, speech-synthesis
YourTTS
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Stars: ✭ 217 (+193.24%)
Mutual labels:  tts, speech-synthesis
Glow Tts
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Stars: ✭ 284 (+283.78%)
Mutual labels:  speech-synthesis, tts
LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (-9.46%)
Mutual labels:  tts, speech-synthesis
ttslearn
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+113.51%)
Mutual labels:  tts, speech-synthesis
Multilingual text to speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.
Stars: ✭ 324 (+337.84%)
Mutual labels:  speech-synthesis, tts
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+632.43%)
Mutual labels:  speech-synthesis, tts

CNNVocoder

NOTE: I'm no longer working on this project. See #9.

A CNN-based vocoder.

This work is inspired from m-cnn model described in Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks. The authors show that even a simple upsampling networks is enough to synthesis waveform from spectrogram/mel-spectrogram.

In this repo, I use spectrogram feature for training model because it contains more information than mel-spectrogram feature. However, because the transformation from spectrogram to mel-spectrogram is just a linear projection, so basically, you can train a simple network predict spectrogram from mel-spectrogram. You also can change parameters to be able to train a vocoder from mel-spectrogram feature too.

Sample Audios

Architecture notes

Compare with m-cnn, my proposed network have some differences:

  • I use Upsampling + Conv layers instead of TransposedConv layer. This helps to prevent checkerboard artifacts.
  • The model use a lot of residual blocks pre/after the upsampling module to make network larger/deeper.
  • I only used l1 loss between log-scale STFT-magnitude of predicted and target waveform. Evaluation loss on log space is better than on raw STFT-magnitude because it's closer to human sensation about loudness. I tried to compute loss on spectrogram feature, but it didn't help much.

Install requirements

$ pip install -r requirements.txt

Training vocoder

1. Prepare dataset

I use LJSpeech dataset for my experiment. If you don't have it yet, please download dataset and put it somewhere.

After that, you can run command to generate dataset for our experiment:

$ python preprocessing.py --samples_per_audio 20 \ 
--out_dir ljspeech \
--data_dir path/to/ljspeech/dataset \
--n_workers 4

2. Train vocoder

$ python train.py --out_dir ${output_directory}

For more training options, please run:

$ python train.py --help

Generate audio from spectrogram

  • Generate spectrogram from audio
$ python gen_spec.py -i sample.wav -o out.npz
  • Generate audio from spectrogram
$ python synthesis.py --model_path path/to/checkpoint \
                      --spec_path out.npz \
                      --out_path out.wav

Pretrained model

You can get my pre-trained model here.

Acknowledgements

This implementation uses code from NVIDIA, Ryuichi Yamamoto, Keith Ito as described in my code.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].