All Projects β†’ rishikksh20 β†’ melgan

rishikksh20 / melgan

Licence: BSD-3-Clause license
MelGAN implementation with Multi-Band and Full Band supports...

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to melgan

Tts
πŸ€– πŸ’¬ Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+9950%)
Mutual labels:  text-to-speech, speech, vocoder, melgan
Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (+35.19%)
Mutual labels:  text-to-speech, speech, speech-synthesis, vocoder
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+4311.11%)
Mutual labels:  text-to-speech, speech-synthesis, vocoder, melgan
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+446.3%)
Mutual labels:  text-to-speech, speech, speech-synthesis
AdaSpeech
AdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+100%)
Mutual labels:  text-to-speech, speech, speech-synthesis
editts
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech
Stars: ✭ 74 (+37.04%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (+570.37%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Lightspeech
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Stars: ✭ 31 (-42.59%)
Mutual labels:  text-to-speech, speech, speech-synthesis
StyleSpeech
Official implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+198.15%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-38.89%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (+105.56%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Stars: ✭ 139 (+157.41%)
Mutual labels:  text-to-speech, speech, speech-synthesis
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-3.7%)
Mutual labels:  text-to-speech, speech, speech-synthesis
ttslearn
ttslearn: Library for Pythonで学ぢ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+192.59%)
Mutual labels:  text-to-speech, speech, speech-synthesis
LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (+24.07%)
Mutual labels:  text-to-speech, speech-synthesis, vocoder
Wsay
Windows "say"
Stars: ✭ 36 (-33.33%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Wavegrad
A fast, high-quality neural vocoder.
Stars: ✭ 138 (+155.56%)
Mutual labels:  text-to-speech, speech, speech-synthesis
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+353.7%)
Mutual labels:  text-to-speech, speech, speech-synthesis
GlottDNN
GlottDNN vocoder and tools for training DNN excitation models
Stars: ✭ 30 (-44.44%)
Mutual labels:  speech-synthesis, vocoder
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-1.85%)
Mutual labels:  text-to-speech, speech-synthesis

Multi-band MelGAN and Full band MelGAN

Unofficial PyTorch implementation of Multi-Band MelGAN paper. This implementation uses Seungwon Park's MelGAN repo as a base and PQMF filters implementation from this repo.
MelGAN :
Multi-band MelGAN:

Prerequisites

Tested on Python 3.6

pip install -r requirements.txt

Prepare Dataset

  • Download dataset for training. This can be any wav files with sample rate 22050Hz. (e.g. LJSpeech was used in paper)
  • preprocess: python preprocess.py -c config/default.yaml -d [data's root path]
  • Edit configuration yaml file

Train & Tensorboard

  • python trainer.py -c [config yaml file] -n [name of the run]
    • cp config/default.yaml config/config.yaml and then edit config.yaml
    • Write down the root path of train/validation files to 2nd/3rd line.
    • Each path should contain pairs of *.wav with corresponding (preprocessed) *.mel file.
    • The data loader parses list of files within the path recursively.
    • For Multi-Band training use config/mb_melgan config file in -c
  • tensorboard --logdir logs/

Pretrained model

Check out here.

Inference

  • python inference.py -p [checkpoint path] -i [input mel path]

Results

Open In Colab

References

License

BSD 3-Clause License.

Useful resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].