All Projects → Kyubyong → specAugment

Kyubyong / specAugment

Licence: Apache-2.0 license
Tensor2tensor experiment with SpecAugment

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to specAugment

Specaugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Stars: ✭ 408 (+786.96%)
Mutual labels:  speech-recognition, data-augmentation
mixup
speechpro.com/
Stars: ✭ 23 (-50%)
Mutual labels:  speech-recognition, data-augmentation
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-56.52%)
Mutual labels:  speech-recognition, specaugment
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (+439.13%)
Mutual labels:  speech-recognition, data-augmentation
CaptionThis
"Caption This" is an iOS app that adds real-time captions to videos for Instagram Stories
Stars: ✭ 12 (-73.91%)
Mutual labels:  speech-recognition
TF-Speech-Recognition-Challenge-Solution
Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Stars: ✭ 58 (+26.09%)
Mutual labels:  speech-recognition
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-54.35%)
Mutual labels:  speech-recognition
idear
🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+82.61%)
Mutual labels:  speech-recognition
masr
中文语音识别系列,读者可以借助它快速训练属于自己的中文语音识别模型,或直接使用预训练模型测试效果。
Stars: ✭ 179 (+289.13%)
Mutual labels:  speech-recognition
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+289.13%)
Mutual labels:  speech-recognition
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+669.57%)
Mutual labels:  speech-recognition
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (+15.22%)
Mutual labels:  speech-recognition
mrnet
Building an ACL tear detector to spot knee injuries from MRIs with PyTorch (MRNet)
Stars: ✭ 98 (+113.04%)
Mutual labels:  data-augmentation
TextNormalizationCoveringGrammars
Covering grammars for English and Russian text normalization
Stars: ✭ 60 (+30.43%)
Mutual labels:  speech-recognition
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (+36.96%)
Mutual labels:  speech-recognition
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+104.35%)
Mutual labels:  speech-recognition
good-speech-web-client
Practice your speech level in any language using speech recognition
Stars: ✭ 26 (-43.48%)
Mutual labels:  speech-recognition
multilingual kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
Stars: ✭ 122 (+165.22%)
Mutual labels:  speech-recognition
revai-python-sdk
Rev AI Python SDK
Stars: ✭ 35 (-23.91%)
Mutual labels:  speech-recognition
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+423.91%)
Mutual labels:  data-augmentation

SpecAugment

Implementation of SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Notes

  • The paper introduces three techniques for augmenting speech data in speech recognition.
  • They come from the observation that spectrograms which often used as input can be treated as images, so various image augmentation methods can be applied.
  • I find the idea interesting.
  • It covers three methods: time warping, frequency masking, and time masking.
  • Details are clearly explained in the paper.
  • While the first one, time warping, looks salient apparently, Daniel, the first author, told me that indeed the other two are much more important than time warping, so it can be ignored if necessary. (Thanks for the advice, Daniel!)
  • I found that implementing time warping with TensorFlow is tricky because the relevant functions are based on the static shape of the melspectrogram tensor, which is hard to get from the pre-defined graph.
  • I test frequency / time masking on Tensor2tensor's LibriSpeech Clean Small Task.
  • The paper used the LAS model, but I stick to Transformer.
  • To compare the effect of specAugment, I also run a base model, which is without augmentation.
  • With 4 GPUs, training (for 500K) seems to take more than a week.

Requirements

  • TensorFlow==1.12.0
  • tensor2tensor==1.12.0

Script

echo "No specAugment"
# Set Paths
MODEL=transformer
HPARAMS=transformer_librispeech_v1

PROBLEM=librispeech_clean_small
DATA_DIR=data/no_spec
TMP_DIR=tmp
TRAIN_DIR=train/$PROBLEM

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=500000 \
  --eval_steps=3 \
  --local_eval_frequency=5000 \ 
  --worker_gpu=4

echo "specAugment"
# Set Paths
PROBLEM=librispeech_specaugment
DATA_DIR=data/spec
TMP_DIR=tmp
TRAIN_DIR=train/$PROBLEM
USER_DIR=USER_DIR

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
t2t-trainer \
  --t2t_usr_dir=$USER_DIR \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=500000 \
  --eval_steps=3 \
  --local_eval_frequency=5000 \ 
  --worker_gpu=4

Results

Training loss

  • Apparently augmentation seems to do harm on training loss. It is understandable and expected.

Word Error Rate (SpecAugment (top) vs. No augmentation (bottom))

  • The base model looks messy. The WER hangs around 26%, which is bad.
  • The specAugment model looks much neater. The WER reached 20% after 500k of training. I don't think it is good enough, though.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].