All Projects → L0SG → NanoFlow

L0SG / NanoFlow

Licence: BSD-3-Clause license
PyTorch implementation of the paper "NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity." (NeurIPS 2020)

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to NanoFlow

gradient-boosted-normalizing-flows
We got a stew going!
Stars: ✭ 20 (-68.25%)
Mutual labels:  density-estimation, normalizing-flows, deep-generative-model
Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (-19.05%)
Mutual labels:  density-estimation, deep-generative-model
deeprob-kit
A Python Library for Deep Probabilistic Modeling
Stars: ✭ 32 (-49.21%)
Mutual labels:  probabilistic-models, normalizing-flows
flowtorch-old
Separating Normalizing Flows code from Pyro and improving API
Stars: ✭ 36 (-42.86%)
Mutual labels:  probabilistic-models, normalizing-flows
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+20.63%)
Mutual labels:  density-estimation, deep-generative-model
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+288.89%)
Mutual labels:  speech-synthesis
GlottDNN
GlottDNN vocoder and tools for training DNN excitation models
Stars: ✭ 30 (-52.38%)
Mutual labels:  speech-synthesis
Normit
Translations with speech synthesis in your terminal as a node package
Stars: ✭ 219 (+247.62%)
Mutual labels:  speech-synthesis
Neural Voice Cloning With Few Samples
Implementation of Neural Voice Cloning with Few Samples Research Paper by Baidu
Stars: ✭ 211 (+234.92%)
Mutual labels:  speech-synthesis
char-rnn
medium.com/@jctestud/yet-another-text-generation-project-5cfb59b26255
Stars: ✭ 20 (-68.25%)
Mutual labels:  generative-models
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+1822.22%)
Mutual labels:  normalizing-flows
sam
Software Automatic Mouth - Tiny Speech Synthesizer
Stars: ✭ 316 (+401.59%)
Mutual labels:  speech-synthesis
tacotron2
Pytorch implementation of "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions", ICASSP, 2018.
Stars: ✭ 17 (-73.02%)
Mutual labels:  speech-synthesis
sova-tts-engine
Tacotron2 based engine for the SOVA-TTS project
Stars: ✭ 63 (+0%)
Mutual labels:  speech-synthesis
Tacotron pytorch
PyTorch implementation of Tacotron speech synthesis model.
Stars: ✭ 242 (+284.13%)
Mutual labels:  speech-synthesis
EfficientMORL
EfficientMORL (ICML'21)
Stars: ✭ 22 (-65.08%)
Mutual labels:  deep-generative-model
Tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Stars: ✭ 2,581 (+3996.83%)
Mutual labels:  speech-synthesis
wiki2ssml
Wiki2SSML provides the WikiVoice markup language used for fine-tuning synthesised voice.
Stars: ✭ 31 (-50.79%)
Mutual labels:  speech-synthesis
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+33.33%)
Mutual labels:  speech-synthesis
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+368.25%)
Mutual labels:  speech-synthesis

NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity

Update: Pretrained weights are now available. See links below.

This repository is an official PyTorch implementation of the paper:

Sang-gil Lee, Sungwon Kim, Sungroh Yoon. "NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity." NeurIPS (2020). [arxiv]

figure_1

A flow-based network is considered to be inefficient in parameter complexity because of reduced expressiveness of bijective mapping, which renders the models unfeasibly expensive in terms of parameters. We present an alternative parameterization scheme called NanoFlow, which uses a single neural density estimator to model multiple transformation stages.

The codebase provides two real-world applications of flow-based models with our method:

  1. Waveform synthesis model (i.e. neural vocoder) based on WaveFlow (Ping et al., ICML 2020). See below for a detailed description.
  2. Image density estimation based on Glow (Kingma et al., NIPS 2018), hosted in a separate image_density_experiments subdirectory.

Setup

  1. Clone this repo and install requirements

    git clone https://github.com/L0SG/NanoFlow.git
    cd NanoFlow
    pip install -r requirements.txt
  2. Install Apex for mixed-precision training

Train your model

  1. Download LJ Speech Data. In this example it's in data/

  2. Make a list of the file names to use for training/testing.

    ls data/*.wav | tail -n+1310 > train_files.txt
    ls data/*.wav | head -n1310 > test_files.txt

    -n+1310 and -n1310 indicates that this example reserves the first 1310 audio clips (10 % of the dataset) for model testing.

  3. Edit the configuration file and train the model.

    Below are the example commands using nanoflow-h16-r128-emb512.json

    nano configs/nanoflow-h16-r128-emb512.json
    python train.py -c configs/nanoflow-h16-r128-emb512.json

    Single-node multi-GPU training is automatically enabled with DataParallel (instead of DistributedDataParallel for simplicity).

    For mixed precision training, set "fp16_run": true on the configuration file.

    You can load the trained weights from saved checkpoints by providing the path to checkpoint_path variable in the config file.

    checkpoint_path accepts either explicit path, or the parent directory if resuming from averaged weights over multiple checkpoints.

    Examples

    insert checkpoint_path: "experiments/nanoflow-h16-r128-emb512/waveflow_5000" in the config file then run

    python train.py -c configs/nanoflow-h16-r128-emb512.json

    for loading averaged weights over 10 recent checkpoints, insert checkpoint_path: "experiments/nanoflow-h16-r128-emb512" in the config file then run

    python train.py -a 10 -c configs/nanoflow-h16-r128-emb512.json

    you can reset the optimizer and training scheduler (and keep the weights) by providing --warm_start

    python train.py --warm_start -c configs/nanoflow-h16-r128-emb512.json
  4. Synthesize waveform from the trained model.

    insert checkpoint_path in the config file and use --synthesize to train.py. The model generates waveform by looping over test_files.txt.

    python train.py --synthesize -c configs/nanoflow-h16-r128-emb512.json

    if fp16_run: true, the model uses FP16 (half-precision) arithmetic for faster performance (on GPUs equipped with Tensor Cores).

Implementation details

Here, we describe architectural details worth mentioning:

  1. We used row-wise autoregressive coupling transformation for the entire data point for each flow. This is implemented by shifting the data point down by one and padding the first upper row with zeros (see shift_1d). The network uses the shifted input for the transformation.

    For example, with h=16, all 16 rows are transformed, whereas the official implementation transforms 15 rows by splitting the input into the upper 1 and remaining 15 rows and performing transformation on 15 rows. The difference in performance is marginal. Our other WaveFlow repo provides more faithful details following the official implementation.

  2. We used math.sqrt(0.5) as a constant multiplier for fused_res_skip similar to other open-source implementation of WaveNet. Later we found that the difference is negligible.

  3. There exists a tiny fraction of unused network parameter (half of the last res_skip_conv layer) for the simplicity of implementation.

  4. We initialized multgate for NanoFlow with ones (self.multgate = nn.Parameter(torch.ones(num_layer, filter_size))) for WaveFlow-based experiments, instead of using zero-init (self.multgate = nn.Parameter(torch.zeros((6, hidden_channels, 1, 1)))) accompanied by multgate = torch.exp(multgate) from Glow-based experiments.

    Later we found no meaningful difference between the two, but the latter assures the positive value range to be interpreted as gating.

  5. reverse_fast implements an edge case version of the convolution queue mechanism without a proper queue system for simplicity. It is only correct up to "n_height": 16 with "n_layer_per_cycle": 1.

Pretrained Weights

We provide pretrained weights via Google Drive. The models are further fine-tuned for additional 2.5 M steps with a constant "learning_rate": 2e-4 from the checkpoint used in the paper, then we averaged weights over 20 last checkpoints with -a 20.

Please note that these models are not based on the best-performing vocoder configuration of the WaveFlow paper and serve as a comparative study. Specifically,

  1. The models are trained on the 90 % of the LJSpeech clips and the remaining 10 % clips are used only for evaluation.
  2. We have not applied the bipartized permutation method in these models.
Models Test set LL (gain) Params (M) Download
waveflow-h16-r64 5.1499 (+0.0142) 5.925 Link
waveflow-h16-r128 5.2263 (+0.0204) 22.336 Link
nanoflow-h16-r128-emb512 5.1711 (+0.0125) 2.819 Link
nanoflow-h16-r128-emb1024-f16 5.2024 (+0.0151) 2.845 Link

You can load the pretrained weights by inserting the path to the "checkpoint_path" in the config file.

Reference

NVIDIA Tacotron2: https://github.com/NVIDIA/tacotron2

NVIDIA WaveGlow: https://github.com/NVIDIA/waveglow

r9y9 wavenet-vocoder: https://github.com/r9y9/wavenet_vocoder

FloWaveNet: https://github.com/ksw0306/FloWaveNet

Parakeet: https://github.com/PaddlePaddle/Parakeet

WaveFlow (unofficial): https://github.com/L0SG/WaveFlow

Glow-PyTorch: https://github.com/y0ast/Glow-PyTorch

Neural Spline Flows (nsf): https://github.com/bayesiains/nsf

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].