R1ckShi / AESRC2020

Licence: Apache-2.0 license

Data preperation scripts, training pipeline and baseline experiment results for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Labels

asr

Projects that are alternatives of or similar to AESRC2020

Lingvo

Stars: ✭ 2,361 (+5802.5%)

Mutual labels: asr

megs

A merged version of multiple open-source German speech datasets.

Stars: ✭ 21 (-47.5%)

Mutual labels: asr

Speech-Corpus-Collection

A Collection of Speech Corpus for ASR and TTS

Stars: ✭ 113 (+182.5%)

Mutual labels: asr

Chinese text normalization

Chinese text normalization for speech processing

Stars: ✭ 242 (+505%)

Mutual labels: asr

Wukong Robot

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，还可能是首个支持脑机交互的开源智能音箱项目。

Stars: ✭ 3,110 (+7675%)

Mutual labels: asr

pie

百度云流式语音识别客户端 SDK

Stars: ✭ 62 (+55%)

Mutual labels: asr

Asr Evaluation

Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).

Stars: ✭ 190 (+375%)

Mutual labels: asr

myG2P

Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech synthesis (TTS).

Stars: ✭ 43 (+7.5%)

Mutual labels: asr

rasr

The RWTH ASR Toolkit.

Stars: ✭ 43 (+7.5%)

Mutual labels: asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+347.5%)

Mutual labels: asr

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (+512.5%)

Mutual labels: asr

Cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

Stars: ✭ 249 (+522.5%)

Mutual labels: asr

wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+412.5%)

Mutual labels: asr

Edgedict

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Stars: ✭ 205 (+412.5%)

Mutual labels: asr

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (+32.5%)

Mutual labels: asr

Kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (+375%)

Mutual labels: asr

leopard

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+785%)

Mutual labels: asr

avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models

Stars: ✭ 76 (+90%)

Mutual labels: asr

opensource-voice-tools

A repo listing known open source voice tools, ordered by where they sit in the voice stack

Stars: ✭ 21 (-47.5%)

Mutual labels: asr

asr24

24-hour Automatic Speech Recognition

Stars: ✭ 27 (-32.5%)

Mutual labels: asr

View All Similar Projects ➔

AESRC2020

介绍

Interspeech 2020 口音英语识别挑战赛数据准备相关脚本、训练流程代码与基线实验结果。

Data preparation scripts and training pipeline for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).

依赖环境

安装Kaldi (数据准备有关功能脚本、Track2传统模型训练) Github链接
安装ESPnet（Track1 E2E AR Model训练、Track2 E2E ASR Transformer训练） Github链接
（可选）安装Google SentencePiece （Track2 E2E ASR 词表缩减、建模单元构建） Github链接
（可选）安装KenLM （N-gram语言模型训练） Github链接

使用说明

数据准备 Data Preparation

下载评测数据
准备数据，划分开发集，特征准备以及训练BPE模型 ./local/prepare_data.sh

口音识别赛道 AR Track

训练Track1 ESPnet AR模型 ./local/track1_espnet_transformer_train.sh

语音识别赛道 ASR Track

训练Track2 Kaldi GMM对齐模型 ./local/track2_kaldi_gmm_train.sh
生成Lattice，决策树，训练Track2 Kaldi Chain Model ./local/track2_kaldi_chain_train.sh
训练Track2 ESPnet Transformer模型（Track2 ESPnet RNN语言模型） ./local/track2_espnet_transformer_train.sh

注意

官方不提供Kaldi模型所需的英文的发音词典
训练脚本中不包括数据扩充、添加Librispeech数据等，参赛者可按需添加
正确安装并激活Kaldi与ESPnet的环境之后才能运行相关脚本
ASR Track中Baseline提供了多种数据的组合、Librispeech全量数据预训练等试验结果
参赛者应严格按照评测中关于数据使用的相关规则训练模型，以确保结果的公平可比性

基线实验结果

Track1基线实验结果

Model	RU	KR	US	PT	JPN	UK	CHN	IND	AVE
Transformer-3L	30.0	45.0	45.7	57.2	48.5	70.0	56.2	83.5	54.1
Transformer-6L	34.0	43.7	30.6	65.7	44.0	74.5	50.9	75.2	52.2
Transformer-12L	49.6	26.0	21.2	51.8	42.7	85.0	38.2	66.1	47.8
+ ASR-init	75.7	55.6	60.2	85.5	73.2	93.9	67.0	97.0	76.1

Transformer-3L、Transformer-6L、Transformer-12L均使用./local/track1_espnet_transformer_train.sh训练（elayers分别为3、6、12），ASR-init实验使用Track2中Joint CTC/Attention模型进行初始化

*在cv集的结果上发现了某个语种的acc与说话人强相关的现象，由于cv集说话人较少，所以上述结果的绝对数值并不具备统计意义，测试集将包含更多的说话人

Track2基线实验结果

Kaldi Hybrid Chain Model: CNN + 18 TDNN *基于内部的非开源英文发音词典 *随后会公布基于CMU词典的结果

ESPnet Transformer Model: 12 Encoder + 6 Decoder (simple self-attention, CTC joint training used, 1k sub-word BPE)

详细超参数见./local/files/conf/目录中模型配置与相关脚本中的设置

	Data	Decode Related	WER on cv set
	Data	Decode Related	RU	KR	US	PT	JPN	UK	CHN	IND	AVE
Kaldi	Accent160	-	6.67	11.46	15.95	10.27	9.78	16.88	20.97	17.48	13.68
	Libri960 ~ Accent160		6.61	10.95	15.33	9.79	9.75	16.03	19.68	16.93	13.13
	Accent160 + Libri160		6.95	11.76	13.05	9.96	10.15	14.21	20.76	18.26	13.14
ESPnet	Accent160	+0.3RNNLM	5.26	7.69	9.96	7.45	6.79	10.06	11.77	10.05	8.63
	Libri960 ~ Accent160	+0.3RNNLM	4.6	6.4	7.42	5.9	5.71	7.64	9.87	7.85	6.92
	Accent160 +Libri160	-	5.35	9.07	8.52	7.13	7.29	8.6	12.03	9.05	8.38
		+0.3RNNLM	4.68	7.59	7.7	6.42	6.37	7.76	10.88	8.41	7.48
		+0.3RNNLM+0.3CTC	4.76	7.81	7.71	6.36	6.4	7.23	10.77	8.01	7.38

* Data A ~ Data B指使用Data B fine-tune Data A训练的模型

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

R1ckShi / AESRC2020

Programming Languages

Labels

Projects that are alternatives of or similar to AESRC2020

AESRC2020

介绍

依赖环境

使用说明

基线实验结果