All Projects → open-speech → Speech Aligner

open-speech / Speech Aligner

Licence: other
speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

Programming Languages

cpp
1120 projects

Projects that are alternatives of or similar to Speech Aligner

Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+4205.41%)
Mutual labels:  speech, kaldi
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-52.12%)
Mutual labels:  speech, kaldi
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (+51.74%)
Mutual labels:  speech, kaldi
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-94.98%)
Mutual labels:  speech, kaldi
Lhotse
Stars: ✭ 236 (-8.88%)
Mutual labels:  speech, kaldi
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+709.65%)
Mutual labels:  speech, kaldi
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+191.89%)
Mutual labels:  speech, kaldi
Setk
Tools for Speech Enhancement integrated with Kaldi
Stars: ✭ 227 (-12.36%)
Mutual labels:  speech, kaldi
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-94.59%)
Mutual labels:  speech, kaldi
opensnips
Open source projects related to Snips https://snips.ai/.
Stars: ✭ 50 (-80.69%)
Mutual labels:  speech, kaldi
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-79.92%)
Mutual labels:  speech
speech-to-text
mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
Stars: ✭ 61 (-76.45%)
Mutual labels:  kaldi
tt-vae-gan
Timbre transfer with variational autoencoding and cycle-consistent adversarial networks. Able to transfer the timbre of an audio source to that of another.
Stars: ✭ 37 (-85.71%)
Mutual labels:  speech
minutes
🔭 Speaker diarization via transfer learning
Stars: ✭ 25 (-90.35%)
Mutual labels:  speech
SER-datasets
A collection of datasets for the purpose of emotion recognition/detection in speech.
Stars: ✭ 74 (-71.43%)
Mutual labels:  speech
Speech256
An FPGA implementation of a classic 80ies speech synthesizer. Done for the Retro Challenge 2017/10.
Stars: ✭ 51 (-80.31%)
Mutual labels:  speech
LIUM
Scripts for LIUM SpkDiarization tools
Stars: ✭ 28 (-89.19%)
Mutual labels:  speech
speech recognition ctc
Use ctc to do chinese speech recognition by keras / 通过keras和ctc实现中文语音识别
Stars: ✭ 40 (-84.56%)
Mutual labels:  speech
jackpair
p2p speech encrypting device with analog audio interface suitable for GSM phones
Stars: ✭ 26 (-89.96%)
Mutual labels:  speech
Noise2Noise-audio denoising without clean training data
Source code for the paper titled "Speech Denoising without Clean Training Data: a Noise2Noise Approach". Paper accepted at the INTERSPEECH 2021 conference. This paper tackles the problem of the heavy dependence of clean speech data required by deep learning based audio denoising methods by showing that it is possible to train deep speech denoisi…
Stars: ✭ 49 (-81.08%)
Mutual labels:  speech

speech-aligner

Chinese readme:

speech-aligner,是一个从“人声语音”及其“语言文本”,产生音素级别时间对齐标注的工具。

示例

# 调用 bin,输入语音列表和文本、输出对齐结果
cd egs/cn_phn
speech-aligner --config=conf/align.conf data/wav.scp data/text data/out.ali
# 查看输出对齐结果,包含: 文件名,音素时间起点(秒) 音素时间终点(秒) 音素
cat data/text data/out.ali
BAC009S0002W0122 而对楼市成交抑制作用最大的限购
BAC009S0002W0122
0.000 0.535 sil
0.535 0.540 $0
0.540 0.745 er_2
0.745 0.850 d
0.850 0.895 ui_4
0.895 1.305 l
1.305 1.435 ou_2
...
4.955 5.055 x
5.055 5.525 ian_4
5.525 5.745 g
5.745 5.930 ou_4
5.930 5.975 sil
.

编译

  • 预先准备:

    • cmake >= 3.1

    • 有如下blas接口数学库之一:

      • 建议:mkl

        • 安装 conda,并通过conda安装mkl:conda install mkl(mkl默认会随conda一起安装)
        • 编译时,确保conda可执行(which conda有输出)
      • atlas

        • ubuntu安装: sudo apt-get install libatlas3-base

        • linux发行版众多,数学库路径不一且变动,所以可以通过如下命令进行路径指定:

        • cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
          
      • OSX系统(Darwin)自带Accelerate framework,可调过这项

      • …其他数学库,可查看cmake/Modules/FindBLAS.cmake,了解支持的数学库

  • cmake编译

    git clone .../speech-aligner.git
    cd speech-aligner
    mkdir build && cd build
    cmake ..
    make -j
    
  • 编译结果

    • bin/speech-aligner: 二进制可执行文件,典型调用见egs/cn_phn/run.sh,包括三个参数:
      • 配置:支持通过配置文件和命令行读取参数,建议使用如--config=egs/cn_phn/conf/align.conf
      • 输入:音频列表、对应的文本列表
      • 输出:音素时间对齐标注

应用场景和示例

  • 研究:
    • 为TTS产生音素时间标注的训练数据
      • egs/cn_phn
  • 工程:
    • 歌词对齐
      • egs/cn_lyric [todo]
    • 字幕对齐
      • egs/cn_subtitle [todo]
  • for fun:
    • 鬼畜
      • egs/cn_gc [todo]

更新

  • 增加支持中文拼音(带调)输入,见egs/cn_phn/data/text

Todo

  • [ ] 中文环境:标点和英文的处理
  • [ ] 增加更多示例

关于

  • 该工程基于著名语音开源项目kaldi,copyright遵循原项目。
  • 示例egs/cn_phn中,使用的音素列表,来自另一个中文词典开源项目DaCiDian

English readme:

speech-aligner, is a tool that generate phoneme-level alignment between human speech and its transcription

Usage example

# call the bin,with speech and transcript as inputs
./bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data/wav.scp egs/cn_phn/data/text egs/cn_phn/data/out.ali
# check the output alignment, include: filename, phoneme and its start/end time
cat egs/cn_phn/data/text egs/cn_phn/data/out.ali
BAC009S0002W0123
0.000 0.025 y
0.025 0.460 e_3
0.460 0.850 sil
0.850 0.985 ch
0.985 1.095 eng_2
...
2.655 2.735 zh
2.735 2.900 ong_1
2.900 2.960 d
2.960 3.665 ing_1
3.665 3.845 sil
.

Compile

  • requirements

    • cmake >= 3.1

    • one of blas math lib:

      • mkl (recommended)

        • install conda, and use it to install mkl: conda install mkl (mkl is installed with conda by default)
        • when cmake, conda should be in your path
      • atlas

        • ubuntu: sudo apt-get install libatlas3-base

        • when cmake, it maynot find your atlas automatically, thus you need set the math lib path as below:

        • cmake -DBLAS_VENDORS=ATLAS -DBLAS_ATLAS_LIB_DIRS=[/path/to/atlas/lib ..
          
      • Accelerate framework (need do nothing for "macOS/Darwin")

      • ...

  • cmake

    git clone .../speech-aligner.git
    cd speech-aligner
    mkdir build && cd build
    cmake ..
    make -j
    
  • results

    • bin/speech-aligner: a binary executable file, with arguments:
      • configuration: through config file (recommendation, e.g.: --config=egs/cn_phn/conf/align.conf) or command line
      • inputs: the wav list and the correspoing transcription list (e.g. egs/cn_phn/data )
      • output: the result alignment

Applications

  • for research:
    • generate training data for TTS
      • egs/cn_phn: generate chinese phoneme alignment
  • for engineering:
    • align lyric
      • egs/cn_lyric [todo]
    • align subtitle
      • egs/cn_subtitle[todo]
  • for fun:
    • きちく
      • egs/cn_gc [todo]

About

  • This project is based on a great speech open-source project kaldi.
  • The phonemes used in the environment: egs/cn_phn, come from a chinese dictionary open-source project DaCiDian.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].