All Projects → neosapience → mlp-singer

neosapience / mlp-singer

Licence: MIT license
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis (IEEE MLSP 2021)

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to mlp-singer

Deep-Learning-Models
Deep Learning Models implemented in python.
Stars: ✭ 17 (-83.5%)
Mutual labels:  mlp, multi-layer-perceptron
python-neuron
Neuron class provides LNU, QNU, RBF, MLP, MLP-ELM neurons
Stars: ✭ 38 (-63.11%)
Mutual labels:  mlp, multi-layer-perceptron
Hms Ml Demo
HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.
Stars: ✭ 187 (+81.55%)
Mutual labels:  text-to-speech
polyssifier
run a multitude of classifiers on you data and get an AUC report
Stars: ✭ 64 (-37.86%)
Mutual labels:  mlp
Wavegrad
Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
Stars: ✭ 245 (+137.86%)
Mutual labels:  text-to-speech
Vonage Ruby Sdk
Vonage REST API client for Ruby. API support for SMS, Voice, Text-to-Speech, Numbers, Verify (2FA) and more.
Stars: ✭ 203 (+97.09%)
Mutual labels:  text-to-speech
Tacotron Pytorch
Pytorch implementation of Tacotron
Stars: ✭ 189 (+83.5%)
Mutual labels:  text-to-speech
Doc2audiobook
Convert text documents to high fidelity audio(books).
Stars: ✭ 175 (+69.9%)
Mutual labels:  text-to-speech
brasiltts
Brasil TTS é um conjunto de sintetizadores de voz, em português do Brasil, que lê telas para portadores de deficiência visual. Transforma texto em áudio, permitindo que pessoas cegas ou com baixa visão tenham acesso ao conteúdo exibido na tela. Embora o principal público-alvo de sistemas de conversão texto-fala – como o Brasil TTS – seja formado…
Stars: ✭ 34 (-66.99%)
Mutual labels:  text-to-speech
Hantts
Chinese Text-to-Speech web service
Stars: ✭ 241 (+133.98%)
Mutual labels:  text-to-speech
TextNormalizationCoveringGrammars
Covering grammars for English and Russian text normalization
Stars: ✭ 60 (-41.75%)
Mutual labels:  text-to-speech
Go Astibob
Golang framework to build an AI that can understand and speak back to you, and everything else you want
Stars: ✭ 222 (+115.53%)
Mutual labels:  text-to-speech
Waveglow
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis
Stars: ✭ 205 (+99.03%)
Mutual labels:  text-to-speech
ttsflow
tensorflow speech synthesis c++ inference for voicenet
Stars: ✭ 17 (-83.5%)
Mutual labels:  text-to-speech
hawking
The retro text-to-speech bot for Discord
Stars: ✭ 24 (-76.7%)
Mutual labels:  text-to-speech
Google Tts
Google TTS (Text-To-Speech) for node.js
Stars: ✭ 180 (+74.76%)
Mutual labels:  text-to-speech
Tts Cube
End-2-end speech synthesis with recurrent neural networks
Stars: ✭ 213 (+106.8%)
Mutual labels:  text-to-speech
oddvoices
An indie singing synthesizer
Stars: ✭ 4 (-96.12%)
Mutual labels:  singing-synthesis
myG2P
Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).
Stars: ✭ 43 (-58.25%)
Mutual labels:  text-to-speech
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-48.54%)
Mutual labels:  text-to-speech

MLP Singer

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Abstract

Recent developments in deep learning have significantly improved the quality of synthesized singing voice audio. However, prominent neural singing voice synthesis systems suffer from slow inference speed due to their autoregressive design. Inspired by MLP-Mixer, a novel architecture introduced in the vision literature for attention-free image classification, we propose MLP Singer, a parallel Korean singing voice synthesis system. To the best of our knowledge, this is the first work that uses an entirely MLP-based architecture for voice synthesis. Listening tests demonstrate that MLP Singer outperforms a larger autoregressive GAN-based system, both in terms of audio quality and synthesis speed. In particular, MLP Singer achieves a real-time factor of up to 200 and 3400 on CPUs and GPUs respectively, enabling order of magnitude faster generation on both environments.

Citation

If you find this work useful, please cite this work as follows.

@article{tae2021mlp,
  title={MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis},
  author={Jaesung Tae and Hyeongju Kim and Younggun Lee},
  journal={2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)},
  year={2021}
 }

Quickstart

  1. Clone the repository including the git submodule.

    git clone --recurse-submodules https://github.com/neosapience/mlp-singer.git
  2. Install package requirements.

cd mlp-singer
pip install -r requirements.txt
  1. To generate audio files with the trained model checkpoint, download the HiFi-GAN checkpoint along with its configuration file and place them in hifi-gan.

  2. Run inference using the following command. Generated audio samples are saved in the samples directory by default.

    python inference.py --checkpoint_path checkpoints/default/model.pt

Dataset

We used the Children Song Dataset, an open-source singing voice dataset comprised of 100 annotated Korean and English children songs sung by a single professional singer. We used only the Korean subset of the dataset to train the model.

You can train the model on any custom dataset of your choice, as long as it includes lyrics text, midi transcriptions, and monophonic a capella audio file triplets. These files should be titled identically, and should also be placed in specific directory locations as shown below.

├── data
│   └── raw
│       ├── mid
│       ├── txt
│       └── wav

The directory names correspond to file extensions. We have included a sample as reference.

Preprocessing

Once you have prepared the dataset, run

python -m data.serialize

from the root directory. This will create data/bin that contains binary files used for training. This repository already contains example binary files created from the sample in data/raw.

Training

To train the model, run

python train.py

This will read the default configuration file located in configs/model.json to initialize the model. Alternatively, you can also create a new configuration and train the model via

python train.py --config_path PATH/TO/CONFIG.json

Running this command will create a folder under the checkpoints directory according to the name field specified in the configuration file.

You can also continue training from a checkpoint. For example, to resume training from the provided pretrained model checkpoint, run

python train.py --checkpoint_path /checkpoints/default/model.pt

Unless a --config_path flag is explicitly provided, the script will read config.json in the checkpoint directory. In both cases, model checkpoints will be saved regularly according to the interval defined in the configuration file.

Inference

MLP Singer produces mel-spectrograms, which are then fed into a neural vocoder to generate raw waveforms. This repository uses HiFi-GAN as the vocoder backend, but you can also plug other vocoders like WaveGlow. To generate samples, run

python inference.py --checkpoint_path PATH/TO/CHECKPOINT.pt --song little_star

This will create .wav samples in the samples directory, and save mel-spectrogram files as .npy files in hifi-gan/test_mel_dirs.

You can also specify any song you want to perform inference on, as long as the song is present in data/raw. The argument to the --song flag should match the title of the song as it is saved in data/raw.

Note

For demo and internal experiments, we used a variant of HiFi-GAN that used different mel-spectrogram configurations. As such, the provided checkpoint for MLP Singer is different from the one referred to in the paper. Moreover, the vocoder used in the demo was further fine-tuned on the Children's Song Dataset.

Acknowledgements

This implementation was inspired by the following repositories.

License

Released under the MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].