All Projects → R1ckShi → AESRC2020

R1ckShi / AESRC2020

Licence: Apache-2.0 license
Data preperation scripts, training pipeline and baseline experiment results for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Labels

Projects that are alternatives of or similar to AESRC2020

Lingvo
Lingvo
Stars: ✭ 2,361 (+5802.5%)
Mutual labels:  asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-47.5%)
Mutual labels:  asr
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+182.5%)
Mutual labels:  asr
Chinese text normalization
Chinese text normalization for speech processing
Stars: ✭ 242 (+505%)
Mutual labels:  asr
Wukong Robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,还可能是首个支持脑机交互的开源智能音箱项目。
Stars: ✭ 3,110 (+7675%)
Mutual labels:  asr
pie
百度云流式语音识别客户端 SDK
Stars: ✭ 62 (+55%)
Mutual labels:  asr
Asr Evaluation
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
Stars: ✭ 190 (+375%)
Mutual labels:  asr
myG2P
Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).
Stars: ✭ 43 (+7.5%)
Mutual labels:  asr
rasr
The RWTH ASR Toolkit.
Stars: ✭ 43 (+7.5%)
Mutual labels:  asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+347.5%)
Mutual labels:  asr
Kerasdeepspeech
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Stars: ✭ 245 (+512.5%)
Mutual labels:  asr
Cn2an
📦 快速转化「中文数字」和「阿拉伯数字」~ (最新特性:分数,日期、温度等转化)
Stars: ✭ 249 (+522.5%)
Mutual labels:  asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+412.5%)
Mutual labels:  asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+412.5%)
Mutual labels:  asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+32.5%)
Mutual labels:  asr
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+375%)
Mutual labels:  asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+785%)
Mutual labels:  asr
avsr-tf1
Audio-Visual Speech Recognition using Sequence to Sequence Models
Stars: ✭ 76 (+90%)
Mutual labels:  asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-47.5%)
Mutual labels:  asr
asr24
24-hour Automatic Speech Recognition
Stars: ✭ 27 (-32.5%)
Mutual labels:  asr

AESRC2020

介绍

Interspeech 2020 口音英语识别挑战赛数据准备相关脚本、训练流程代码与基线实验结果。

Data preparation scripts and training pipeline for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

依赖环境

  1. 安装Kaldi (数据准备有关功能脚本、Track2传统模型训练) Github链接
  2. 安装ESPnet(Track1 E2E AR Model训练、Track2 E2E ASR Transformer训练) Github链接
  3. (可选)安装Google SentencePiece (Track2 E2E ASR 词表缩减、建模单元构建) Github链接
  4. (可选)安装KenLM (N-gram语言模型训练) Github链接

使用说明

数据准备 Data Preparation

  1. 下载评测数据
  2. 准备数据,划分开发集,特征准备以及训练BPE模型 ./local/prepare_data.sh

口音识别赛道 AR Track

训练Track1 ESPnet AR模型 ./local/track1_espnet_transformer_train.sh

语音识别赛道 ASR Track

  1. 训练Track2 Kaldi GMM对齐模型 ./local/track2_kaldi_gmm_train.sh
  2. 生成Lattice,决策树,训练Track2 Kaldi Chain Model ./local/track2_kaldi_chain_train.sh
  3. 训练Track2 ESPnet Transformer模型(Track2 ESPnet RNN语言模型) ./local/track2_espnet_transformer_train.sh

注意

  1. 官方不提供Kaldi模型所需的英文的发音词典
  2. 训练脚本中不包括数据扩充、添加Librispeech数据等,参赛者可按需添加
  3. 正确安装并激活Kaldi与ESPnet的环境之后才能运行相关脚本
  4. ASR Track中Baseline提供了多种数据的组合、Librispeech全量数据预训练等试验结果
  5. 参赛者应严格按照评测中关于数据使用的相关规则训练模型,以确保结果的公平可比性

基线实验结果

Track1基线实验结果

Model RU KR US PT JPN UK CHN IND AVE
Transformer-3L 30.0 45.0 45.7 57.2 48.5 70.0 56.2 83.5 54.1
Transformer-6L 34.0 43.7 30.6 65.7 44.0 74.5 50.9 75.2 52.2
Transformer-12L 49.6 26.0 21.2 51.8 42.7 85.0 38.2 66.1 47.8
+ ASR-init 75.7 55.6 60.2 85.5 73.2 93.9 67.0 97.0 76.1

Transformer-3L、Transformer-6L、Transformer-12L均使用./local/track1_espnet_transformer_train.sh训练(elayers分别为3、6、12),ASR-init实验使用Track2中Joint CTC/Attention模型进行初始化

*在cv集的结果上发现了某个语种的acc与说话人强相关的现象,由于cv集说话人较少,所以上述结果的绝对数值并不具备统计意义,测试集将包含更多的说话人

Track2基线实验结果

Kaldi Hybrid Chain Model: CNN + 18 TDNN *基于内部的非开源英文发音词典 *随后会公布基于CMU词典的结果

ESPnet Transformer Model: 12 Encoder + 6 Decoder (simple self-attention, CTC joint training used, 1k sub-word BPE)

详细超参数见./local/files/conf/目录中模型配置与相关脚本中的设置

Data Decode Related WER on cv set
RU KR US PT JPN UK CHN IND AVE
Kaldi
Accent160 - 6.67 11.46 15.95 10.27 9.78 16.88 20.97 17.48 13.68
Libri960 ~ Accent160 6.61 10.95 15.33 9.79 9.75 16.03 19.68 16.93 13.13
Accent160 + Libri160 6.95 11.76 13.05 9.96 10.15 14.21 20.76 18.26 13.14
ESPnet
Accent160 +0.3RNNLM 5.26 7.69 9.96 7.45 6.79 10.06 11.77 10.05 8.63
Libri960 ~ Accent160 +0.3RNNLM 4.6 6.4 7.42 5.9 5.71 7.64 9.87 7.85 6.92
Accent160 +Libri160
- 5.35 9.07 8.52 7.13 7.29 8.6 12.03 9.05 8.38
+0.3RNNLM 4.68 7.59 7.7 6.42 6.37 7.76 10.88 8.41 7.48
+0.3RNNLM+0.3CTC 4.76 7.81 7.71 6.36 6.4 7.23 10.77 8.01 7.38
* Data A ~ Data B指使用Data B fine-tune Data A训练的模型
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].