Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Deepest-Project → Melnet

Deepest-Project / Melnet

Licence: mit

Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch generative-model tts

Projects that are alternatives of or similar to Melnet

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (-33.54%)

Mutual labels: tts, generative-model

Aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Stars: ✭ 1,942 (+1106.21%)

Mutual labels: tts

Marytts

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Stars: ✭ 1,699 (+955.28%)

Mutual labels: tts

Dla

Deep learning for audio processing

Stars: ✭ 142 (-11.8%)

Mutual labels: tts

Androidmarytts

Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS

Stars: ✭ 134 (-16.77%)

Mutual labels: tts

Tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Stars: ✭ 1,756 (+990.68%)

Mutual labels: tts

Cramer Gan

Tensorflow Implementation on "The Cramer Distance as a Solution to Biased Wasserstein Gradients" (https://arxiv.org/pdf/1705.10743.pdf)

Stars: ✭ 123 (-23.6%)

Mutual labels: generative-model

Disentangled Person Image Generation

Tensorflow implementation of CVPR 2018 paper "Disentangled Person Image Generation"

Stars: ✭ 158 (-1.86%)

Mutual labels: generative-model

Msg Net

Multi-style Generative Network for Real-time Transfer

Stars: ✭ 152 (-5.59%)

Mutual labels: generative-model

Conditional Gan

Anime Generation

Stars: ✭ 141 (-12.42%)

Mutual labels: generative-model

Semantic image inpainting

Semantic Image Inpainting

Stars: ✭ 140 (-13.04%)

Mutual labels: generative-model

Gesturegan

[ACM MM 2018 Oral] GestureGAN for Hand Gesture-to-Gesture Translation in the Wild

Stars: ✭ 136 (-15.53%)

Mutual labels: generative-model

Gretel Synthetics

Differentially private learning to create fake, synthetic datasets with enhanced privacy guarantees

Stars: ✭ 147 (-8.7%)

Mutual labels: generative-model

Talkify

Javascript Text to speech library

Stars: ✭ 132 (-18.01%)

Mutual labels: tts

Automatic Youtube Reddit Text To Speech Video Generator And Uploader

A series of 3 programs that will automatically receive scripts from Reddit, allow the user to edit them, then be sent off to a video generator where they will be uploaded to YouTube automatically.

Stars: ✭ 152 (-5.59%)

Mutual labels: tts

First Order Model

This repository contains the source code for the paper First Order Motion Model for Image Animation

Stars: ✭ 11,964 (+7331.06%)

Mutual labels: generative-model

Ha Tts Bluetooth Speaker

TTS Bluetooth Speaker for Home Assistant

Stars: ✭ 140 (-13.04%)

Mutual labels: tts

Tensorflowtts

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1379.5%)

Mutual labels: tts

Tts Papers

🐸 collection of TTS papers

Stars: ✭ 160 (-0.62%)

Mutual labels: tts

Stylegan2 Pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

Stars: ✭ 2,656 (+1549.69%)

Mutual labels: generative-model

View All Similar Projects ➔

MelNet

Implementation of MelNet: A Generative Model for Audio in the Frequency Domain

Prerequisites

Tested with Python 3.6.8 & 3.7.4, PyTorch 1.2.0 & 1.3.0.
pip install -r requirements.txt

How to train

Datasets

Blizzard, VoxCeleb2, and KSS have YAML files provided under config/. For other datasets, fill out your own YAML file according to the other provided ones.
Unconditional training is possible for all kinds of datasets, provided that they have a consistent file extension specified by data.extension within the YAML file.
Conditional training is currently only implemented for KSS and a subset of the Blizzard dataset.

Running the code

python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
- Each tier can be trained separately. Since each tier is larger than the one before it (with the exception of tier 1), modify the batch size for each tier.
  - Tier 6 of the Blizzard dataset does not fit on a 16GB P100, even with a batch size of 1.
- The -s flag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when [tier number] != 0 . Warning: this flag is toggled True no matter what follows the flag. Ignore it if you're not planning to use it.

How to sample

Preparing the checkpoints

The checkpoints must be stored under chkpt/.
A YAML file named inference.yaml must be provided under config/.
inference.yaml must specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.

Running the code

python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
- Timestep refers to the length of the mel spectrogram. The ratio of timestep to seconds is roughly [sample rate] : [hop length of FFT].
- The -i flag is optional, only needed for conditional generation. Surround the sentence with "" and end with ..
- Both unconditional generation and conditional generation currently does not support primed generation (extrapolating from provided data).

To-do

[x] Implement upsampling procedure
[x] GMM sampling + loss function
[x] Unconditional audio generation
[x] TTS synthesis
[x] Tensorboard logging
[x] Multi-GPU training
[ ] Primed generation

Implementation authors

Seungwon Park, June Young Yi, Yoonhyung Lee, Joowhan Song @ Deepest Season 6

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 161

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗