Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → descriptinc → Melgan Neurips

descriptinc / Melgan Neurips

Licence: mit

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning pytorch gans speech-synthesis

Projects that are alternatives of or similar to Melgan Neurips

Libfaceid

libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.

Stars: ✭ 354 (-40.2%)

Mutual labels: speech-synthesis

Gansformer

Generative Adversarial Transformers

Stars: ✭ 421 (-28.89%)

Mutual labels: gans

Java Speech Api

The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.

Stars: ✭ 490 (-17.23%)

Mutual labels: speech-synthesis

Espnet

End-to-End Speech Processing Toolkit

Stars: ✭ 4,533 (+665.71%)

Mutual labels: speech-synthesis

Fast Srgan

A Fast Deep Learning Model to Upsample Low Resolution Videos to High Resolution at 30fps

Stars: ✭ 417 (-29.56%)

Mutual labels: gans

Mimicry

[CVPR 2020 Workshop] A PyTorch GAN library that reproduces research results for popular GANs.

Stars: ✭ 458 (-22.64%)

Mutual labels: gans

Espeak

eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.

Stars: ✭ 339 (-42.74%)

Mutual labels: speech-synthesis

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (-8.45%)

Mutual labels: speech-synthesis

Sprocket

Voice Conversion Tool Kit

Stars: ✭ 425 (-28.21%)

Mutual labels: speech-synthesis

Autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Stars: ✭ 485 (-18.07%)

Mutual labels: speech-synthesis

Sdv

Synthetic Data Generation for tabular, relational and time series data.

Stars: ✭ 360 (-39.19%)

Mutual labels: gans

Anycost Gan

[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Stars: ✭ 367 (-38.01%)

Mutual labels: gans

Gantts

PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC)

Stars: ✭ 460 (-22.3%)

Mutual labels: speech-synthesis

Voice Builder

An opensource text-to-speech (TTS) voice building tool

Stars: ✭ 362 (-38.85%)

Mutual labels: speech-synthesis

Von

[NeurIPS 2018] Visual Object Networks: Image Generation with Disentangled 3D Representation.

Stars: ✭ 497 (-16.05%)

Mutual labels: gans

Attentiongan

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

Stars: ✭ 341 (-42.4%)

Mutual labels: gans

Rewriting

Rewriting a Deep Generative Model, ECCV 2020 (oral). Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered. Change StyleGANv2 to make extravagant eyebrows, or horses wearing hats.

Stars: ✭ 454 (-23.31%)

Mutual labels: gans

Flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

Stars: ✭ 546 (-7.77%)

Mutual labels: speech-synthesis

Termit

Translations with speech synthesis in your terminal as a ruby gem

Stars: ✭ 505 (-14.7%)

Mutual labels: speech-synthesis

Tf.gans Comparison

Implementations of (theoretical) generative adversarial networks and comparison without cherry-picking

Stars: ✭ 477 (-19.43%)

Mutual labels: gans

View All Similar Projects ➔

Official repository for the paper MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks. Blog post with samples and accompanying code coming soon.

Visit our website for samples. You can try the speech correction application here created based on the end-to-end speech synthesis pipeline using MelGAN.

Check the slides if you aren't attending the NeurIPS 2019 conference to check out our poster.

Code organization

├── README.md             <- Top-level README.
├── set_env.sh            <- Set PYTHONPATH and CUDA_VISIBLE_DEVICES.
│
├── mel2wav
│   ├── dataset.py           <- data loader scripts
│   ├── modules.py           <- Model, layers and losses
│   ├── utils.py             <- Utilities to monitor, save, log, schedule etc.
│
├── scripts
│   ├── train.py                    <- training / validation / etc scripts
│   ├── generate_from_folder.py

Preparing dataset

Create a raw folder with all the samples stored in wavs/ subfolder. Run these commands:

ls wavs/*.wav | tail -n+10 > train_files.txt
ls wavs/*.wav | head -n10 > test_files.txt

Training Example

. source set_env.sh 0
# Set PYTHONPATH and use first GPU
python scripts/train.py --save_path logs/baseline --path <root_data_folder>

PyTorch Hub Example

import torch
vocoder = torch.hub.load('descriptinc/melgan-neurips', 'load_melgan')
vocoder.inverse(audio)  # audio (torch.tensor) -> (batch_size, 80, timesteps)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 592

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (23) 🔗