All Projects → bshall → Universalvocoding

bshall / Universalvocoding

Licence: mit
A PyTorch implementation of "Robust Universal Neural Vocoding"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Universalvocoding

Pytorch Dc Tts
Text to Speech with PyTorch (English and Mongolian)
Stars: ✭ 122 (-38.07%)
Mutual labels:  speech-synthesis
Wavegrad
A fast, high-quality neural vocoder.
Stars: ✭ 138 (-29.95%)
Mutual labels:  speech-synthesis
Vocgan
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Stars: ✭ 158 (-19.8%)
Mutual labels:  speech-synthesis
Legacy straight
A vocoder framework which had been widely used in research community since 1999.
Stars: ✭ 130 (-34.01%)
Mutual labels:  speech-synthesis
Zerospeech
VQ-VAE for Acoustic Unit Discovery and Voice Conversion
Stars: ✭ 137 (-30.46%)
Mutual labels:  speech-synthesis
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+1109.14%)
Mutual labels:  speech-synthesis
Durian
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (-43.65%)
Mutual labels:  speech-synthesis
Expressive tacotron
Tensorflow Implementation of Expressive Tacotron
Stars: ✭ 192 (-2.54%)
Mutual labels:  speech-synthesis
Diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Stars: ✭ 139 (-29.44%)
Mutual labels:  speech-synthesis
Tacotron 2
DeepMind's Tacotron-2 Tensorflow implementation
Stars: ✭ 1,968 (+898.98%)
Mutual labels:  speech-synthesis
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-32.49%)
Mutual labels:  speech-synthesis
Xva Synth
Machine learning based speech synthesis Electron app, with voices from specific characters from video games
Stars: ✭ 136 (-30.96%)
Mutual labels:  speech-synthesis
Wavenet vocoder
WaveNet vocoder
Stars: ✭ 1,926 (+877.66%)
Mutual labels:  speech-synthesis
Marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Stars: ✭ 1,699 (+762.44%)
Mutual labels:  speech-synthesis
Cyclegan Vc2
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
Stars: ✭ 158 (-19.8%)
Mutual labels:  speech-synthesis
Deepvoice3 pytorch
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Stars: ✭ 1,654 (+739.59%)
Mutual labels:  speech-synthesis
Prosody
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (-29.44%)
Mutual labels:  speech-synthesis
Lingvo
Lingvo
Stars: ✭ 2,361 (+1098.48%)
Mutual labels:  speech-synthesis
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (-13.2%)
Mutual labels:  speech-synthesis
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+958.38%)
Mutual labels:  speech-synthesis

Open In Colab

Towards Achieving Robust Universal Neural Vocoding

A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding. Audio samples can be found here. Colab demo can be found here. Accompanying Tacotron implementation can be found here

Architecture of the vocoder.
Fig 1:Architecture of the vocoder.

Quick Start

Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:

pip install univoc

Example Usage

Open In Colab

import torch
import soundfile as sf
from univoc import Vocoder

# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
    "https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()

# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)
mel = ...

# generate waveform
with torch.no_grad():
    wav, sr = vocoder.generate(mel)

# save output
sf.write("path/to/save.wav", wav, sr)

Train from Scratch

  1. Clone the repo:
git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding
  1. Install requirements:
pip install -r requirements.txt
  1. Download and extract the LJ-Speech dataset:
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2
  1. Download the train split here and extract it in the root directory of the repo.
  2. Extract Mel spectrograms and preprocess audio:
python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1
  1. Train the model:
python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1

Pretrained Models

Pretrained weights for the 10-bit LJ-Speech model are available here.

Notable Differences from the Paper

  1. Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the ZeroSpeech 2019: TTS without T English dataset click here.
  2. Uses an embedding layer instead of one-hot encoding.

Acknowlegements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].