All Projects → descriptinc → Melgan Neurips

descriptinc / Melgan Neurips

Licence: mit
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Melgan Neurips

Libfaceid
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
Stars: ✭ 354 (-40.2%)
Mutual labels:  speech-synthesis
Gansformer
Generative Adversarial Transformers
Stars: ✭ 421 (-28.89%)
Mutual labels:  gans
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (-17.23%)
Mutual labels:  speech-synthesis
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+665.71%)
Mutual labels:  speech-synthesis
Fast Srgan
A Fast Deep Learning Model to Upsample Low Resolution Videos to High Resolution at 30fps
Stars: ✭ 417 (-29.56%)
Mutual labels:  gans
Mimicry
[CVPR 2020 Workshop] A PyTorch GAN library that reproduces research results for popular GANs.
Stars: ✭ 458 (-22.64%)
Mutual labels:  gans
Espeak
eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
Stars: ✭ 339 (-42.74%)
Mutual labels:  speech-synthesis
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (-8.45%)
Mutual labels:  speech-synthesis
Sprocket
Voice Conversion Tool Kit
Stars: ✭ 425 (-28.21%)
Mutual labels:  speech-synthesis
Autovc
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Stars: ✭ 485 (-18.07%)
Mutual labels:  speech-synthesis
Sdv
Synthetic Data Generation for tabular, relational and time series data.
Stars: ✭ 360 (-39.19%)
Mutual labels:  gans
Anycost Gan
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing
Stars: ✭ 367 (-38.01%)
Mutual labels:  gans
Gantts
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)
Stars: ✭ 460 (-22.3%)
Mutual labels:  speech-synthesis
Voice Builder
An opensource text-to-speech (TTS) voice building tool
Stars: ✭ 362 (-38.85%)
Mutual labels:  speech-synthesis
Von
[NeurIPS 2018] Visual Object Networks: Image Generation with Disentangled 3D Representation.
Stars: ✭ 497 (-16.05%)
Mutual labels:  gans
Attentiongan
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation
Stars: ✭ 341 (-42.4%)
Mutual labels:  gans
Rewriting
Rewriting a Deep Generative Model, ECCV 2020 (oral). Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered. Change StyleGANv2 to make extravagant eyebrows, or horses wearing hats.
Stars: ✭ 454 (-23.31%)
Mutual labels:  gans
Flowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Stars: ✭ 546 (-7.77%)
Mutual labels:  speech-synthesis
Termit
Translations with speech synthesis in your terminal as a ruby gem
Stars: ✭ 505 (-14.7%)
Mutual labels:  speech-synthesis
Tf.gans Comparison
Implementations of (theoretical) generative adversarial networks and comparison without cherry-picking
Stars: ✭ 477 (-19.43%)
Mutual labels:  gans

Official repository for the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks. Blog post with samples and accompanying code coming soon.

Visit our website for samples. You can try the speech correction application here created based on the end-to-end speech synthesis pipeline using MelGAN.

Check the slides if you aren't attending the NeurIPS 2019 conference to check out our poster.

Code organization

├── README.md             <- Top-level README.
├── set_env.sh            <- Set PYTHONPATH and CUDA_VISIBLE_DEVICES.
│
├── mel2wav
│   ├── dataset.py           <- data loader scripts
│   ├── modules.py           <- Model, layers and losses
│   ├── utils.py             <- Utilities to monitor, save, log, schedule etc.
│
├── scripts
│   ├── train.py                    <- training / validation / etc scripts
│   ├── generate_from_folder.py

Preparing dataset

Create a raw folder with all the samples stored in wavs/ subfolder. Run these commands:

ls wavs/*.wav | tail -n+10 > train_files.txt
ls wavs/*.wav | head -n10 > test_files.txt

Training Example

. source set_env.sh 0
# Set PYTHONPATH and use first GPU
python scripts/train.py --save_path logs/baseline --path <root_data_folder>

PyTorch Hub Example

import torch
vocoder = torch.hub.load('descriptinc/melgan-neurips', 'load_melgan')
vocoder.inverse(audio)  # audio (torch.tensor) -> (batch_size, 80, timesteps)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].