All Projects → NVIDIA → Mellotron

NVIDIA / Mellotron

Licence: bsd-3-clause
Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data

Projects that are alternatives of or similar to Mellotron

Pythoncode Tutorials
The Python Code Tutorials
Stars: ✭ 544 (-2.86%)
Mutual labels:  jupyter-notebook
Reinforce
Reinforcement Learning Algorithm Package & PuckWorld, GridWorld Gym environments
Stars: ✭ 552 (-1.43%)
Mutual labels:  jupyter-notebook
Gs Quant
Python toolkit for quantitative finance
Stars: ✭ 556 (-0.71%)
Mutual labels:  jupyter-notebook
Neural Collage
Collaging on Internal Representations: An Intuitive Approach for Semantic Transfiguration
Stars: ✭ 549 (-1.96%)
Mutual labels:  jupyter-notebook
Competitive Data Science
Materials for "How to Win a Data Science Competition: Learn from Top Kagglers" course
Stars: ✭ 551 (-1.61%)
Mutual labels:  jupyter-notebook
Torch Residual Networks
This is a Torch implementation of ["Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun](http://arxiv.org/abs/1512.03385) the winners of the 2015 ILSVRC and COCO challenges.
Stars: ✭ 553 (-1.25%)
Mutual labels:  jupyter-notebook
Fuzzingbook
Project page for "The Fuzzing Book"
Stars: ✭ 549 (-1.96%)
Mutual labels:  jupyter-notebook
Data Analysis And Machine Learning Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Stars: ✭ 5,166 (+822.5%)
Mutual labels:  jupyter-notebook
Deepnlp Course
Deep NLP Course
Stars: ✭ 551 (-1.61%)
Mutual labels:  jupyter-notebook
Log Progress
https://habr.com/ru/post/276725/
Stars: ✭ 556 (-0.71%)
Mutual labels:  jupyter-notebook
Data Structures Using Python
This is my repository for Data Structures using Python
Stars: ✭ 546 (-2.5%)
Mutual labels:  jupyter-notebook
Curve Text Detector
This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.
Stars: ✭ 551 (-1.61%)
Mutual labels:  jupyter-notebook
Numerical Tours
Numerical Tours of Signal Processing
Stars: ✭ 553 (-1.25%)
Mutual labels:  jupyter-notebook
Gan
Tooling for GANs in TensorFlow
Stars: ✭ 547 (-2.32%)
Mutual labels:  jupyter-notebook
Qs ledger
Quantified Self Personal Data Aggregator and Data Analysis
Stars: ✭ 559 (-0.18%)
Mutual labels:  jupyter-notebook
Bandits
Python library for Multi-Armed Bandits
Stars: ✭ 547 (-2.32%)
Mutual labels:  jupyter-notebook
Ada Build
The Ada Developers Academy Jump Start program, which is intended for anyone who is interested in beginning their journey into coding.
Stars: ✭ 551 (-1.61%)
Mutual labels:  jupyter-notebook
Influence Release
Stars: ✭ 559 (-0.18%)
Mutual labels:  jupyter-notebook
Data Science Portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-0.18%)
Mutual labels:  jupyter-notebook
Tf Dann
Domain-Adversarial Neural Network in Tensorflow
Stars: ✭ 556 (-0.71%)
Mutual labels:  jupyter-notebook

Mellotron

Rafael Valle*, Jason Li*, Ryan Prenger and Bryan Catanzaro

In our recent paper we propose Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data.

By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice.

Visit our website for audio samples.

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Setup

  1. Clone this repo: git clone https://github.com/NVIDIA/mellotron.git
  2. CD into this repo: cd mellotron
  3. Initialize submodule: git submodule init; git submodule update
  4. Install PyTorch
  5. Install Apex
  6. Install python requirements or build docker image
    • Install python requirements: pip install -r requirements.txt

Training

  1. Update the filelists inside the filelists folder to point to your data
  2. python train.py --output_directory=outdir --log_directory=logdir
  3. (OPTIONAL) tensorboard --logdir=outdir/logdir

Training using a pre-trained model

Training using a pre-trained model can lead to faster convergence
By default, the speaker embedding layer is ignored

  1. Download our published Mellotron model trained on LibriTTS or LJS
  2. python train.py --output_directory=outdir --log_directory=logdir -c models/mellotron_libritts.pt --warm_start

Multi-GPU (distributed) and Automatic Mixed Precision Training

  1. python -m multiproc train.py --output_directory=outdir --log_directory=logdir --hparams=distributed_run=True,fp16_run=True

Inference demo

  1. jupyter notebook --ip=127.0.0.1 --port=31337
  2. Load inference.ipynb
  3. (optional) Download our published WaveGlow model

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis.

Acknowledgements

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman, Chengqi Deng, Patrice Guyot, as described in our code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].