All Projects → Jackiexiao → Mtts

Jackiexiao / Mtts

Licence: mit
A Demo of Mandarin/Chinese TTS frontend

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Mtts

Amazon Polly Sample
Sample application for Amazon Polly. Allows to convert any blog into an audio podcast.
Stars: ✭ 139 (-39.3%)
Mutual labels:  tts
Tts Papers
🐸 collection of TTS papers
Stars: ✭ 160 (-30.13%)
Mutual labels:  tts
Tacotron Pytorch
Pytorch implementation of Tacotron
Stars: ✭ 189 (-17.47%)
Mutual labels:  tts
Dla
Deep learning for audio processing
Stars: ✭ 142 (-37.99%)
Mutual labels:  tts
Aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+748.03%)
Mutual labels:  tts
Mrcp Plugin With Freeswitch
使用FreeSWITCH接受用户手机呼叫,通过UniMRCP Server集成讯飞开放平台(xfyun)插件将用户语音进行语音识别(ASR),并根据自定义业务逻辑调用语音合成(TTS),构建简单的端到端语音呼叫中心。
Stars: ✭ 168 (-26.64%)
Mutual labels:  tts
Talkify
Javascript Text to speech library
Stars: ✭ 132 (-42.36%)
Mutual labels:  tts
Mimic Recording Studio
Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice with Mimic2
Stars: ✭ 202 (-11.79%)
Mutual labels:  tts
Automatic Youtube Reddit Text To Speech Video Generator And Uploader
A series of 3 programs that will automatically receive scripts from Reddit, allow the user to edit them, then be sent off to a video generator where they will be uploaded to YouTube automatically.
Stars: ✭ 152 (-33.62%)
Mutual labels:  tts
Speaker adapted tts
Making a TTS model with 1 minute of speech samples within 10 minutes
Stars: ✭ 183 (-20.09%)
Mutual labels:  tts
Tensorflowtts
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Stars: ✭ 2,382 (+940.17%)
Mutual labels:  tts
Awesome Speech Recognition Speech Synthesis Papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Stars: ✭ 2,085 (+810.48%)
Mutual labels:  tts
Gst Tacotron
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Stars: ✭ 175 (-23.58%)
Mutual labels:  tts
Ha Tts Bluetooth Speaker
TTS Bluetooth Speaker for Home Assistant
Stars: ✭ 140 (-38.86%)
Mutual labels:  tts
Multi Tacotron Voice Cloning
Phoneme multilingual(Russian-English) voice cloning based on
Stars: ✭ 192 (-16.16%)
Mutual labels:  tts
Androidmarytts
Android MARY TTS - an open-source, offline HMM-Based text-to-speech synthesis system based on MaryTTS
Stars: ✭ 134 (-41.48%)
Mutual labels:  tts
Melnet
Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"
Stars: ✭ 161 (-29.69%)
Mutual labels:  tts
Tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Stars: ✭ 2,581 (+1027.07%)
Mutual labels:  tts
Lingvo
Lingvo
Stars: ✭ 2,361 (+931%)
Mutual labels:  tts
Google Tts
Google TTS (Text-To-Speech) for node.js
Stars: ✭ 180 (-21.4%)
Mutual labels:  tts

本项目已停止维护

推荐:https://github.com/thuhcsi/Crystal

欢迎加入

  • 语音合成交流QQ群:882726654

Build Status

A Demo of MTTS Mandarin/Chinese Text to Speech FrontEnd

Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit

这只是一个语音合成前端的Demo,没有提供文本正则化,韵律预测功能,文字转拼音使用pypinyin,分词使用结巴分词,这两者的准确度也达不到商用水平。

其他语音合成项目传送门,端到端是不错的方向,自然度要优于merlin。

This is only a demo of mandarin frontend which is lack of some parts like "text normalization" and "prosody prediction", and the phone set && Question Set this project use havn't fully tested yet.

一个粗略的文档:A draft documentation written in Mandarin

Data

There is no open-source mandarin speech synthesis dataset on the internet, this proj used thchs30 dataset to demostrate speech synthesis

UPDATE

open-source mandarin speech synthesis data from data-banker company, 开源的中文语音合成数据,感谢标贝公司

【数据下载】https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar 【数据说明】http://www.data-baker.com/open_source.html

Generated Samples

Listen to https://jackiexiao.github.io/MTTS/

How To Reproduce

  1. First, you need data contain wav and txt (prosody mark is optional)
  2. Second, generate HTS label using this project
  3. Using merlin/egs/mandarin_voice to train and generate Mandarin Voice

Context related annotation & Question Set

Install

Python : python3.6
System: linux(tested on ubuntu16.04)

pip install jieba pypinyin
sudo apt-get install libatlas3-base

Run bash tools/install_mtts.sh
Or download file by yourself

Run Demo

bash run_demo.sh

Usage

1. Generate HTS Label by wav and text

  • Usage: Run python src/mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add-a your_acoustic_model.zip, otherwise, this project use thchs30.zip acoustic model as default
  • Attention: Currently only support Chinese Character, txt should not have any Arabia number or English alphabet(不可包含阿拉伯数字和英文字符)

txtfile example

A_01 这是一段文本
A_02 这是第二段文本

wav_directory example(Sampleing Rate should larger than 16khz)

A_01.wav  
A_02.wav  

2. Generate HTS Label by text with or without alignment file

  • Usage: Run python src/mandarin_frontend.py txtfile output_directory_path
  • or import mandarin_frontend
from mandarin_frontend import txt2label

result = txt2label('向香港特别行政区同胞澳门和台湾同胞海外侨胞')
[print(line) for line in result]

# with prosody mark and alignment file (sfs file)
# result = txt2label('向#1香港#2特别#1行政区#1同胞#4澳门#2和#1台湾#1同胞#4海外#1侨胞',
            sfsfile='example_file/example.sfs')

see source code for more information, but pay attention to the alignment file(sfs file), the format is endtime phone_type not start_time, phone_type(which is different from speech ocean's data)

3. Forced-alignment

This project use Montreal-Forced-Aligner to do forced alignment, if you want to get a better alignment, use your data to train a alignment-model, see mfa: algin-using-only-the-dataset

  1. We trained the acoustic model using thchs30 dataset, see misc/thchs30.zip, the dictionary we use mandarin_mtts.lexicon. If you use larger dataset than thchs30, you may get better alignment.
  2. If you want to use mfa's (montreal-forced-aligner) pre-trained mandarin model, this is the dictionary you need mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon

Prosody Mark

You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)

"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.

  • #0 stands for word segment
  • #1 stands for prosodic word
  • #2 stands for stressful word (actually in this project we regrad it as #1)
  • #3 stands for prosodic phrase
  • #4 stands for intonational phrase

Improvement to be done in future

  • Text Normalization
  • Better Chinese word segment
  • G2P: Polyphone Problem
  • Better Label format and Question Set
  • Improvement of prosody analyse
  • Better alignment

Contributor

  • Jackiexiao
  • willian56
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].