All Projects → Deepest-Project → Melnet

Deepest-Project / Melnet

Licence: mit
Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Melnet

Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (-33.54%)
Mutual labels:  tts, generative-model
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+1106.21%)
Mutual labels:  tts
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+955.28%)
Mutual labels:  tts
Dla
Deep learning for audio processing
Stars: ✭ 142 (-11.8%)
Mutual labels:  tts
Androidmarytts
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Stars: ✭ 134 (-16.77%)
Mutual labels:  tts
Tacotron
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Stars: ✭ 1,756 (+990.68%)
Mutual labels:  tts
Cramer Gan
Tensorflow Implementation on "The Cramer Distance as a Solution to Biased Wasserstein Gradients" (https://arxiv.org/pdf/1705.10743.pdf)
Stars: ✭ 123 (-23.6%)
Mutual labels:  generative-model
Disentangled Person Image Generation
Tensorflow implementation of CVPR 2018 paper "Disentangled Person Image Generation"
Stars: ✭ 158 (-1.86%)
Mutual labels:  generative-model
Msg Net
Multi-style Generative Network for Real-time Transfer
Stars: ✭ 152 (-5.59%)
Mutual labels:  generative-model
Conditional Gan
Anime Generation
Stars: ✭ 141 (-12.42%)
Mutual labels:  generative-model
Semantic image inpainting
Semantic Image Inpainting
Stars: ✭ 140 (-13.04%)
Mutual labels:  generative-model
Gesturegan
[ACM MM 2018 Oral] GestureGAN for Hand Gesture-to-Gesture Translation in the Wild
Stars: ✭ 136 (-15.53%)
Mutual labels:  generative-model
Gretel Synthetics
Differentially private learning to create fake, synthetic datasets with enhanced privacy guarantees
Stars: ✭ 147 (-8.7%)
Mutual labels:  generative-model
Talkify
Javascript Text to speech library
Stars: ✭ 132 (-18.01%)
Mutual labels:  tts
Automatic Youtube Reddit Text To Speech Video Generator And Uploader
A series of 3 programs that will automatically receive scripts from Reddit, allow the user to edit them, then be sent off to a video generator where they will be uploaded to YouTube automatically.
Stars: ✭ 152 (-5.59%)
Mutual labels:  tts
First Order Model
This repository contains the source code for the paper First Order Motion Model for Image Animation
Stars: ✭ 11,964 (+7331.06%)
Mutual labels:  generative-model
Ha Tts Bluetooth Speaker
TTS Bluetooth Speaker for Home Assistant
Stars: ✭ 140 (-13.04%)
Mutual labels:  tts
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1379.5%)
Mutual labels:  tts
Tts Papers
🐸 collection of TTS papers
Stars: ✭ 160 (-0.62%)
Mutual labels:  tts
Stylegan2 Pytorch
Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement
Stars: ✭ 2,656 (+1549.69%)
Mutual labels:  generative-model

MelNet

Implementation of MelNet: A Generative Model for Audio in the Frequency Domain

Prerequisites

  • Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
  • pip install -r requirements.txt

How to train

Datasets

  • Blizzard, VoxCeleb2, and KSS have YAML files provided under config/. For other datasets, fill out your own YAML file according to the other provided ones.
  • Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by data.extension within the YAML file.
  • Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

Running the code

  • python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
    • Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
      • Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
    • The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 . Warning: this flag is toggled True no matter what follows the flag. Ignore it if you're not planning to use it.

How to sample

Preparing the checkpoints

  • The checkpoints must be stored under chkpt/.
  • A YAML file named inference.yaml must be provided under config/.
  • inference.yaml must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

Running the code

  • python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
    • Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly [sample rate] : [hop length of FFT].
    • The -i flag is optional, only needed for conditional generation. Surround the sentence with "" and end with ..
    • Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

To-do

  • [x] Implement upsampling procedure
  • [x] GMM sampling + loss function
  • [x] Unconditional audio generation
  • [x] TTS synthesis
  • [x] Tensorboard logging
  • [x] Multi-GPU training
  • [ ] Primed generation

Implementation authors

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].