Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → tuan3w → Cnn_vocoder

tuan3w / Cnn_vocoder

Licence: mit

A fast cnn-based vocoder

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch tts speech-synthesis

Projects that are alternatives of or similar to Cnn vocoder

Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Stars: ✭ 22 (-70.27%)

Mutual labels: tts, speech-synthesis

Hifi Gan

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Stars: ✭ 325 (+339.19%)

Mutual labels: speech-synthesis, tts

Parakeet

PAddle PARAllel text-to-speech toolKIT (supporting WaveFlow, WaveNet, Transformer TTS and Tacotron2)

Stars: ✭ 279 (+277.03%)

Mutual labels: speech-synthesis, tts

Fre-GAN-pytorch

Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Stars: ✭ 73 (-1.35%)

Mutual labels: tts, speech-synthesis

Wsay

Windows "say"

Stars: ✭ 36 (-51.35%)

Mutual labels: speech-synthesis, tts

editts

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Stars: ✭ 74 (+0%)

Mutual labels: tts, speech-synthesis

Cognitive Speech Tts

Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.

Stars: ✭ 312 (+321.62%)

Mutual labels: speech-synthesis, tts

talkie

Text-to-speech browser extension button. Select text on any web page, and have the computer read it out loud for you by simply clicking the Talkie button.

Stars: ✭ 43 (-41.89%)

Mutual labels: tts, speech-synthesis

Cs224n Gpu That Talks

Attention, I'm Trying to Speak: End-to-end speech synthesis (CS224n '18)

Stars: ✭ 52 (-29.73%)

Mutual labels: speech-synthesis, tts

Voice Builder

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (+389.19%)

Mutual labels: speech-synthesis, tts

Jsut Lab

HTS-style full-context labels for JSUT v1.1

Stars: ✭ 28 (-62.16%)

Mutual labels: speech-synthesis, tts

Parallelwavegan

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch

Stars: ✭ 682 (+821.62%)

Mutual labels: speech-synthesis, tts

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-29.73%)

Mutual labels: tts, speech-synthesis

esp32-flite

Speech synthesis running on ESP32 based on Flite engine.

Stars: ✭ 28 (-62.16%)

Mutual labels: tts, speech-synthesis

YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Stars: ✭ 217 (+193.24%)

Mutual labels: tts, speech-synthesis

Glow Tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Stars: ✭ 284 (+283.78%)

Mutual labels: speech-synthesis, tts

LVCNet

LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Stars: ✭ 67 (-9.46%)

Mutual labels: tts, speech-synthesis

ttslearn

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

Stars: ✭ 158 (+113.51%)

Mutual labels: tts, speech-synthesis

Multilingual text to speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.

Stars: ✭ 324 (+337.84%)

Mutual labels: speech-synthesis, tts

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (+632.43%)

Mutual labels: speech-synthesis, tts

View All Similar Projects ➔

CNNVocoder

NOTE: I'm no longer working on this project. See #9.

A CNN-based vocoder.

This work is inspired from m-cnn model described in Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks. The authors show that even a simple upsampling networks is enough to synthesis waveform from spectrogram/mel-spectrogram.

In this repo, I use spectrogram feature for training model because it contains more information than mel-spectrogram feature. However, because the transformation from spectrogram to mel-spectrogram is just a linear projection, so basically, you can train a simple network predict spectrogram from mel-spectrogram. You also can change parameters to be able to train a vocoder from mel-spectrogram feature too.

Sample Audios

Architecture notes

Compare with m-cnn, my proposed network have some differences:

I use Upsampling + Conv layers instead of TransposedConv layer. This helps to prevent checkerboard artifacts.
The model use a lot of residual blocks pre/after the upsampling module to make network larger/deeper.
I only used l1 loss between log-scale STFT-magnitude of predicted and target waveform. Evaluation loss on log space is better than on raw STFT-magnitude because it's closer to human sensation about loudness. I tried to compute loss on spectrogram feature, but it didn't help much.

Install requirements

$ pip install -r requirements.txt

Training vocoder

1. Prepare dataset

I use LJSpeech dataset for my experiment. If you don't have it yet, please download dataset and put it somewhere.

After that, you can run command to generate dataset for our experiment:

$ python preprocessing.py --samples_per_audio 20 \ 
--out_dir ljspeech \
--data_dir path/to/ljspeech/dataset \
--n_workers 4

2. Train vocoder

$ python train.py --out_dir ${output_directory}

For more training options, please run:

$ python train.py --help

Generate audio from spectrogram

Generate spectrogram from audio

$ python gen_spec.py -i sample.wav -o out.npz

Generate audio from spectrogram

$ python synthesis.py --model_path path/to/checkpoint \
                      --spec_path out.npz \
                      --out_path out.wav

Pretrained model

You can get my pre-trained model here.

Acknowledgements

This implementation uses code from NVIDIA, Ryuichi Yamamoto, Keith Ito as described in my code.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 74

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗