Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → bshall → Universalvocoding

bshall / Universalvocoding

Licence: mit

A PyTorch implementation of "Robust Universal Neural Vocoding"

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch speech-synthesis

Projects that are alternatives of or similar to Universalvocoding

Text to Speech with PyTorch (English and Mongolian)

Stars: ✭ 122 (-38.07%)

Mutual labels: speech-synthesis

A fast, high-quality neural vocoder.

Stars: ✭ 138 (-29.95%)

Mutual labels: speech-synthesis

VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

Stars: ✭ 158 (-19.8%)

Mutual labels: speech-synthesis

Legacy straight

A vocoder framework which had been widely used in research community since 1999.

Stars: ✭ 130 (-34.01%)

Mutual labels: speech-synthesis

VQ-VAE for Acoustic Unit Discovery and Voice Conversion

Stars: ✭ 137 (-30.46%)

Mutual labels: speech-synthesis

😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

Stars: ✭ 2,382 (+1109.14%)

Mutual labels: speech-synthesis

Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.

Stars: ✭ 111 (-43.65%)

Mutual labels: speech-synthesis

Expressive tacotron

Tensorflow Implementation of Expressive Tacotron

Stars: ✭ 192 (-2.54%)

Mutual labels: speech-synthesis

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Stars: ✭ 139 (-29.44%)

Mutual labels: speech-synthesis

DeepMind's Tacotron-2 Tensorflow implementation

Stars: ✭ 1,968 (+898.98%)

Mutual labels: speech-synthesis

Awesome Ai Services

An overview of the AI-as-a-service landscape

Stars: ✭ 133 (-32.49%)

Mutual labels: speech-synthesis

Machine learning based speech synthesis Electron app, with voices from specific characters from video games

Stars: ✭ 136 (-30.96%)

Mutual labels: speech-synthesis

Wavenet vocoder

WaveNet vocoder

Stars: ✭ 1,926 (+877.66%)

Mutual labels: speech-synthesis

MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java

Stars: ✭ 1,699 (+762.44%)

Mutual labels: speech-synthesis

Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2

Stars: ✭ 158 (-19.8%)

Mutual labels: speech-synthesis

Deepvoice3 pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Stars: ✭ 1,654 (+739.59%)

Mutual labels: speech-synthesis

Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text

Stars: ✭ 139 (-29.44%)

Mutual labels: speech-synthesis

Lingvo

Stars: ✭ 2,361 (+1098.48%)

Mutual labels: speech-synthesis

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!

Stars: ✭ 171 (-13.2%)

Mutual labels: speech-synthesis

Awesome Speech Recognition Speech Synthesis Papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+958.38%)

Mutual labels: speech-synthesis

View All Similar Projects ➔

Towards Achieving Robust Universal Neural Vocoding

A PyTorch implementation of Towards Achieving Robust Universal Neural Vocoding. Audio samples can be found here. Colab demo can be found here. Accompanying Tacotron implementation can be found here

Architecture of the vocoder.

^{Fig 1:Architecture of the vocoder.}

Quick Start

Ensure you have Python 3.6 and PyTorch 1.7 or greater installed. Then install the package with:

pip install univoc

Example Usage

import torch
import soundfile as sf
from univoc import Vocoder

# download pretrained weights (and optionally move to GPU)
vocoder = Vocoder.from_pretrained(
    "https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()

# load log-Mel spectrogram from file or from tts (see https://github.com/bshall/Tacotron for example)
mel = ...

# generate waveform
with torch.no_grad():
    wav, sr = vocoder.generate(mel)

# save output
sf.write("path/to/save.wav", wav, sr)

Train from Scratch

Clone the repo:

git clone https://github.com/bshall/UniversalVocoding
cd ./UniversalVocoding

Install requirements:

pip install -r requirements.txt

Download and extract the LJ-Speech dataset:

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar -xvjf LJSpeech-1.1.tar.bz2

Download the train split here and extract it in the root directory of the repo.
Extract Mel spectrograms and preprocess audio:

python preprocess.py in_dir=path/to/LJSpeech-1.1 out_dir=datasets/LJSpeech-1.1

Train the model:

python train.py checkpoint_dir=ljspeech dataset_dir=datasets/LJSpeech-1.1

Pretrained Models

Pretrained weights for the 10-bit LJ-Speech model are available here.

Notable Differences from the Paper

Trained on 16kHz audio from a single speaker. For an older version trained on 102 different speakers form the ZeroSpeech 2019: TTS without T English dataset click here.
Uses an embedding layer instead of one-hot encoding.

Acknowlegements

https://github.com/fatchord/WaveRNN

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 197

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (14) 🔗