Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → m-toman → Tacorn

m-toman / Tacorn

Licence: mit

2018/2019 TTS framework integrating state of the art open source methods

Labels

jupyter-notebook

Projects that are alternatives of or similar to Tacorn

Juypter Notebooks

neural network explorations ⚡️ i know it's misspelled

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Spinal Bootcamp

SpinalHDL-tutorial based on Jupyter Notebook

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

Pytorch connectomics

PyTorch Connectomics: segmentation toolbox for EM connectomics

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

<따라 하며 배우는 데이터 과학> (2017) 소스코드

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Animefacenotebooks

notebooks and some data for playing with animeface stylegan2 and deepdanbooru

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

GPT-2 French demo | Démo française de GPT-2

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

A simple neural network for calling het-/hom-variants from alignments of single molecule reads to a reference

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Multilateration

Multilateration in 2D: IoT/LoRaWAN Mass Surveillance

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

Code repository for our paper "Comprehensive Image Captioning via Scene Graph Decomposition" in ECCV 2020.

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Play With Machine Learning Algorithms

Code of my MOOC Course <Play with Machine Learning Algorithms>. Updated contents and practices are also included. 我在慕课网上的课程《Python3 入门机器学习》示例代码。课程的更多更新内容及辅助练习也将逐步添加进这个代码仓。

Stars: ✭ 1,037 (+2106.38%)

Mutual labels: jupyter-notebook

Flashlight is a lightweight Python library for analyzing and solving quadrotor control problems.

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

VM with the TensorFlow library from Google

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Price Forecaster

Forecasting the future prices of BTC and More using Machine and Deep Learning Models

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

Eegclassificationmcnn

Solution for EEG Classification via Multiscale Convolutional Net coded for NeuroHack at Yandex.

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Novel molecules from a reference shape!

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

Deepdream pytorch

Stars: ✭ 46 (-2.13%)

Mutual labels: jupyter-notebook

Detectron2 instance segmentation demo

How to train Detectron2 with Custom COCO Datasets | DLology

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

清华大学AI自强项目课件以及代码下载，黑龙江大学机器学习小组学习历程。@清华大学数据院，感谢他们的课件以及源码

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

Repository for experiments with MetaProd2Vec and related algorithms.

Stars: ✭ 47 (+0%)

Mutual labels: jupyter-notebook

🎵 Algorithms written in different programming languages - https://zoranpandovski.github.io/al-go-rithms/

Stars: ✭ 1,036 (+2104.26%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

WARNING: This repository was an experiment and is not maintained anymore.

tacorn

TTS framework bridging different 2018/2019 state-of-the-art open source methods. Currently aims to combine the Tacotron-2 implementation by Rayhane-mamah (https://github.com/Rayhane-mamah/Tacotron-2) with a fork of the alternative WaveRNN implementation by fatchord (https://github.com/fatchord/WaveRNN). The overall goal is to more easily allow swapping out single components.

Introduction

Speech synthesis systems consist of multiple components which have traditionally been developed manually and are increasingly being replaced by machine learning models.

Here we define three components used in statistical parametric speech synthesis. We don't consider unit selection or hybrid unit selection systems or physical modeling based systems.

Data flows along those components, producing intermediate representations which are then input to the next component. While in training we typically deal with large datasets and intermediate representations are typically stored on hard disk, we want to avoid this at synthesis time and aim to hold everything in memory.

Text analysis

Component to generate a linguistic specification from text input.

Traditionally this involves hand-coded language specific rules, a pronuncation dictionary, letter-to-sound (or grapheme-to-phoneme) model for out of dictionary words and potentially additional models, e.g. ToBI endtone prediction, part of speech tagging, phrasing prediction etc. The result for a given input sentence is a sequence of linguistic specifications, for example encoded as HTK labels. This specification typically at least holds a sequence of phones (or phonemes) but typically also includes contextual information like surrounding phones, interpunctuation, counts for segments, syllables, words, phrases etc. (see for example https://github.com/MattShannon/HTS-demo_CMU-ARCTIC-SLT-STRAIGHT-AR-decision-tree/blob/master/data/lab_format.pdf). Examples for actual systems to perform text analysis are Festival, Flite or Ossian (REFs).

Recent systems take a step towards end-to-end synthesis and aim to replace those often complex codebases by machine learning models. Here we focus on Tacotron (REF).

Acoustic feature prediction

Component consuming a linguistic specification to predict some sort of intermediate acoustic representation.

Intermediate acoustic representations are used because of useful properties for modeling but also because they are typically using a lower time resolution than the raw waveforms. Almost all commonly used representations employ a Fourier transformation, so for example with a commonly used window shift of 5ms we end up with only 200 feature vectors per second instead of 48000 for 48kHz speech. Examples for commonly used features include Mel-Frequency Cepstral Coefficents (MFCCs) and Line Spectral Pairs (LSPs). Furthermore, additional features like fundamental frequency (F0) or aperiodicity features are commonly used.

The acoustic feature prediction component traditionally often employed a separate duration model to predict the number of acoustic features to be generated for each segment (i.e. phone), then an acoustic model to predict the actual acoustic features. Here we focus on Tacotron, which employs an attention-baed sequence to sequence model to merge duration and acoustic feature prediction into a single model.

Waveform generation

Component generating waveforms from acoustic features.

The component performing this operation is often called a Vocoder and traditionally involves signal processing to encode and decode speech. Examples for Vocoders are STRAIGHT, WORLD, hts_engine, GlottHMM, GlottDNN or Vocaine.

Recently neural vocoders were employed with good success and include WaveNet, WaveRNN, WaveGlow, FFTNet and SampleRNN (REFs). The main disadvantage of neural vocoders is that they are yet another model that has to be trained, typically even per speaker. This now only means additional computing resources and time required but also complicates deployment and requires additional hyperparameter tuning for this model. Possibilities to work around this include multi-speaker models or speaker-independent modeling (https://arxiv.org/abs/1811.06292).

Here we focus on WaveRNN although the currently included Tacotron-2 implementation by Rayhane-mamah also includes WaveNet.

Experiment folder contents

config: holds configurations for the experiment and the components.
raw: input corpus.
raw/wavs: input waveforms.
raw/meta: input meta information, typically at least a transcription.
features: holds intermediate representations used in training and synthesis.
features/acoustic: holds preprocessed features for acoustic model training, e.g. mel spectrum, linguistic specifications.
features/acoustic2wavegen: holds output features from acoustic used as input to wavegen.
features/acoustic2wavegen/training: holds output features from acoustic used as input to wavegen training (e.g. ground-truth-aligned mel spectra).
features/acoustic2wavegen/synthesis: holds output features from acoustic used as input to wavegen synthesis (e.g. mel spectra).
features/wavegen: holds input features for the waveform generation model training, e.g. mel spectrum and raw waveforms.
models: working directories for models/components.
models/acoustic: working directory for the acoustic feature prediction component.
models/wavegen: working directory for the waveform generation component.
synthesized: synthesized wavefiles and metainformation.

Process

Create

Input: Configuration parameters
Output: Configured experiment directory
Invocation: create.py

Creates a new experiment directory.

Preprocessing

Input: corpus in raw or given by parameter
Output: processed features in features or acoustic_model
Invocation: preprocess.py

Preprocessing waveforms and orthographic transcription.

Training

Input: processed features
Output: trained models in acoustic_model and wavegen_model
Invocation: train.py

Train feature prediction and neural vocoder models.

Synthesis

Input: text, trained models in acoustic_modeland wavegen_model
Output: wavefiles in synthesized_wavs
Invocation: synthesis.py

Export

TODO

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 47

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗