All Projects → yoyololicon → constant-memory-waveglow

yoyololicon / constant-memory-waveglow

Licence: other
PyTorch implementation of NVIDIA WaveGlow with constant memory cost.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to constant-memory-waveglow

tacotron2
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow
Stars: ✭ 102 (+183.33%)
Mutual labels:  nvidia, waveglow
cookietts
TTS from Cookie. Messy and experimental!
Stars: ✭ 29 (-19.44%)
Mutual labels:  waveglow, waveflow
gnome-nvidia-extension
A Gnome extension to show NVIDIA GPU information
Stars: ✭ 29 (-19.44%)
Mutual labels:  nvidia
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (-11.11%)
Mutual labels:  nvidia
continuous-time-flow-process
PyTorch code of "Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows" (NeurIPS 2020)
Stars: ✭ 34 (-5.56%)
Mutual labels:  normalizing-flows
Music-Style-Transfer
Source code for "Transferring the Style of Homophonic Music Using Recurrent Neural Networks and Autoregressive Model"
Stars: ✭ 16 (-55.56%)
Mutual labels:  wavenet
faucon
NVIDIA Falcon Microprocessor Suite
Stars: ✭ 28 (-22.22%)
Mutual labels:  nvidia
unity-fracture
Fracture any mesh at runtime
Stars: ✭ 634 (+1661.11%)
Mutual labels:  nvidia
nvidia-video-codec-rs
Bindings for the NVIDIA Video Codec SDK
Stars: ✭ 24 (-33.33%)
Mutual labels:  nvidia
handbrake-nvenc-docker
Handbrake GUI with Web browser and VNC access. Supports NVENC encoding
Stars: ✭ 32 (-11.11%)
Mutual labels:  nvidia
QPPWG
Quasi-Periodic Parallel WaveGAN Pytorch implementation
Stars: ✭ 41 (+13.89%)
Mutual labels:  wavenet
VAENAR-TTS
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.
Stars: ✭ 66 (+83.33%)
Mutual labels:  glow
purge-nvda
Optimize external graphics for macs with discrete NVIDIA GPUs.
Stars: ✭ 91 (+152.78%)
Mutual labels:  nvidia
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+2113.89%)
Mutual labels:  nvidia
Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Stars: ✭ 78 (+116.67%)
Mutual labels:  nvidia
grepo
GKISS - A fork of KISS Linux that uses the GNU C library, mirror of https://codeberg.org/kiss-community/grepo
Stars: ✭ 51 (+41.67%)
Mutual labels:  nvidia
installROS
Install ROS Melodic on NVIDIA Jetson Development Kits
Stars: ✭ 75 (+108.33%)
Mutual labels:  nvidia
nvidia-jetson-rt
Real-Time Scheduling with NVIDIA Jetson TX2
Stars: ✭ 38 (+5.56%)
Mutual labels:  nvidia
InvertibleNetworks.jl
A Julia framework for invertible neural networks
Stars: ✭ 86 (+138.89%)
Mutual labels:  normalizing-flows
dofbot-jetson nano
Yahboom DOFBOT AI Vision Robotic Arm with ROS for Jetson NANO 4GB B01
Stars: ✭ 24 (-33.33%)
Mutual labels:  nvidia

Constant Memory WaveGlow

DOI

A PyTorch implementation of WaveGlow: A Flow-based Generative Network for Speech Synthesis using constant memory method described in Training Glow with Constant Memory Cost.

The model implementation details are slightly differed from the official implementation based on personal favor, and the project structure is brought from pytorch-template.

Besides, we also add implementations of Baidu's WaveFlow, and MelGlow, which are easier to train and more memory fiendly.

In addition to neural vocoder, we also add an implementation of audio super-resolution model WSRGlow.

Requirements

After install the requirements from pytorch-template:

pip install nnAudio torch_optimizer

Quick Start

Modify the data_dir in the json file to a directory which has a bunch of wave files with the same sampling rate, then your are good to go. The mel-spectrogram will be computed on the fly.

{
  "data_loader": {
    "type": "RandomWaveFileLoader",
    "args": {
      "data_dir": "/your/data/wave/files",
      "batch_size": 8,
      "num_workers": 2,
      "segment": 16000
    }
  }
}
python train.py -c config.json

Memory consumption of model training in PyTorch

Model Memory (MB)
WaveGlow, channels=256, batch size=24 (naive) N.A.
WaveGlow, channels=256, batch size=24 (efficient) 4951

Result

WaveGlow

I trained the model on some cello music pieces from MusicNet using the musicnet_config.json. The clips in the samples folder is what I got. Although the audio quality is not very good, it's possible to use WaveGlow on music generation as well. The generation speed is around 470kHz on a 1080ti.

WaveFlow

I trained on full LJ speech dataset using the waveflow_LJ_speech.json. The settings are corresponding to the 64 residual channels, h=64 model in the paper. After training about 1.25M steps, the audio quality is very similiar to their official examples. Samples generated from training data can be listened here.

MelGlow

Coming soon.

WSRGlow

Pre-trained models on VCTK dataset are available here. We follow the settings of NU-Wave to get the training data.

Citation

If you use our code on any project and research, please cite:

@misc{memwaveglow,
  doi          = {10.5281/zenodo.3874330},
  author       = {Chin Yun Yu},
  title        = {Constant Memory WaveGlow: A PyTorch implementation of WaveGlow with constant memory cost},
  howpublished = {\url{https://github.com/yoyololicon/constant-memory-waveglow}},
  year         = {2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].