All Projects → CSTR-Edinburgh → magphase

CSTR-Edinburgh / magphase

Licence: Apache-2.0 license
MagPhase Vocoder: Speech analysis/synthesis system for TTS and related applications.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to magphase

Fre-GAN-pytorch
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Stars: ✭ 73 (-3.95%)
Mutual labels:  tts, vocoder
WaveGrad2
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Stars: ✭ 55 (-27.63%)
Mutual labels:  tts, synthesis
Tts
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
Stars: ✭ 5,427 (+7040.79%)
Mutual labels:  tts, vocoder
LVCNet
LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Stars: ✭ 67 (-11.84%)
Mutual labels:  tts, vocoder
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+3034.21%)
Mutual labels:  tts, vocoder
csound-extended
Extensions for Csound including algorithmic composition, Android app, and WebAssembly.
Stars: ✭ 38 (-50%)
Mutual labels:  synthesis
FFTNet
FFTNet: a Real-Time Speaker-Dependent Neural Vocoder
Stars: ✭ 63 (-17.11%)
Mutual labels:  vocoder
MouseTooltipTranslator
chrome extension - When mouse hover on text, it shows translated tooltip using google translate
Stars: ✭ 93 (+22.37%)
Mutual labels:  tts
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (-14.47%)
Mutual labels:  tts
Cross-Speaker-Emotion-Transfer
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Stars: ✭ 107 (+40.79%)
Mutual labels:  tts
TensorVox
Desktop application for neural speech synthesis written in C++
Stars: ✭ 140 (+84.21%)
Mutual labels:  tts
soundpad-text-to-speech
Text-To-Speech for Soundpad
Stars: ✭ 29 (-61.84%)
Mutual labels:  tts
myprosody
A Python library for measuring the acoustic features of speech (simultaneous speech, high entropy) compared to ones of native speech.
Stars: ✭ 162 (+113.16%)
Mutual labels:  speech-analysis
synthesis
🔥 Synthesis is Meteor + Polymer
Stars: ✭ 28 (-63.16%)
Mutual labels:  synthesis
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+17.11%)
Mutual labels:  tts
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-56.58%)
Mutual labels:  tts
lessampler
lessampler is a Singing Voice Synthesizer
Stars: ✭ 59 (-22.37%)
Mutual labels:  synthesis
pytorch FFTNet
A pytorch implementation of FFTNet.
Stars: ✭ 35 (-53.95%)
Mutual labels:  vocoder
vietTTS
Vietnamese Text to Speech library
Stars: ✭ 78 (+2.63%)
Mutual labels:  vocoder
speech course
YSDA course in Speech Processing.
Stars: ✭ 93 (+22.37%)
Mutual labels:  tts

MagPhase Vocoder

Speech analysis/synthesis system for TTS and related applications.

This software is based on the method described in the paper:

F. Espic, C. Valentini-Botinhao, and S. King, “Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis,” in Proc. Interspeech, Stockholm, Sweden, August, 2017.

@ Author: Felipe Espic

More information at http://www.felipeespic.com/magphase/

I. New in Version 2.0 (April 2018)

  • Constant frame-rate support.
  • Improved sound quality.
  • Two types of post-filter available.
  • Selectable number of coefficients for phase features (real and imag).
  • Selectable number of coefficients for the magnitude feature (mag).

II. Description

This is a speech waveform analysis/synthesis system used in Statistical Parametric Speech Synthesis (SPSS).

The analysis module extracts four feature streams describing magnitude spectra, phase spectra, and F0. These features can be used to train a regression model (e.g., DNN, LSTM, HMM. etc.) so then, predicted values can be generated. The synthesis module takes these features at the input to generate the final synthesised waveform.

Key points:

  • Avoids estimation steps as far as possible (no aperiodicities, spectral envelope, or harmonics estimation, etc.)
  • Robust extraction and modelling of phase spectra (Conventional vocoders just create artificial phase at the output).
  • No phase unwrapping required.
  • Uses fast operations during synthesis (e.g., FFT, PSOLA).
  • Remarkably reduces typical "buzziness" and "phasiness".
  • Many other applications and improvements not explored yet.

III. License:

See the LICENCE file for details.

IV. Requirements:

  • OS: Linux (MacOSx coming soon)
  • Python 2.7
  • Python packages: numpy, scipy, soundfile, matplotlib

V. Install:

  1. Install Pyhton 2.7 and the packages required using the package manager of your distro or by using the command pip (recomended). e.g.,
pip install numpy scipy soundfile matplotlib
  1. Download MagPhase: git clone https://github.com/CSTR-Edinburgh/magphase.git

  2. Download and compile SPTK and REAPER by:

cd magphase/tools
./download_and_compile_tools.sh

This will compile and configure SPTK and REAPER automatically for you...and that's it!

VI. Usage:

Just go to /demos, read the instructions inside the demo scripts, which are very discriptive. They should run out of the box by running python <demo_script>.

We recomend that you play firstly with demo_copy_synthesis_lossless.py , and then demo_copy_synthesis_low_dim.py They both perform analysis/synthesis routines.

Then, you can modify the demo scripts to suit your needs.

NOTE: Just remember to run the scripts from their locations.

VII. Using MagPhase with the Merlin toolkit:

We provide two demos distributed with the Merlin's official distribution. These show examples of the of Merlin with MagPhase integration:

VIII. Colaboration:

We need help to improve this software. You can colaborate by:

  • Building TTS voices using Merlin and MagPhase and compare with other vocoders, e.g., WORLD. Then, please tell us your results. We have tested MagPhase only with a few voices and it's needed to cover a wider range. We have recently fixed some bugs that have came out thanks to people reporting their results using new data.

  • Implementing native variable frame rate support in Merlin. MagPhase works in a variable frame rate fashion (pitch synchronous).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].