All Projects → speechpro → mixup

speechpro / mixup

Licence: Apache-2.0 license
speechpro.com/

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to mixup

Specaugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Stars: ✭ 408 (+1673.91%)
Mutual labels:  speech-recognition, data-augmentation
specAugment
Tensor2tensor experiment with SpecAugment
Stars: ✭ 46 (+100%)
Mutual labels:  speech-recognition, data-augmentation
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (+978.26%)
Mutual labels:  speech-recognition, data-augmentation
SnapMix
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)
Stars: ✭ 127 (+452.17%)
Mutual labels:  data-augmentation, mixup
DataAugmentationTF
Implementation of modern data augmentation techniques in TensorFlow 2.x to be used in your training pipeline.
Stars: ✭ 35 (+52.17%)
Mutual labels:  data-augmentation, mixup
manifold mixup
Tensorflow implementation of the Manifold Mixup machine learning research paper
Stars: ✭ 24 (+4.35%)
Mutual labels:  regularization, data-augmentation
OrientedRepPoints DOTA
Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet
Stars: ✭ 62 (+169.57%)
Mutual labels:  mixup
resnet-cifar10
ResNet for Cifar10
Stars: ✭ 21 (-8.7%)
Mutual labels:  data-augmentation
candock
A time series signal analysis and classification framework
Stars: ✭ 56 (+143.48%)
Mutual labels:  data-augmentation
KoEDA
Korean Easy Data Augmentation
Stars: ✭ 62 (+169.57%)
Mutual labels:  data-augmentation
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Stars: ✭ 816 (+3447.83%)
Mutual labels:  data-augmentation
Rus-SpeechRecognition-LSTM-CTC-VoxForge
Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge
Stars: ✭ 50 (+117.39%)
Mutual labels:  speech-recognition
augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Stars: ✭ 49 (+113.04%)
Mutual labels:  data-augmentation
bird species classification
Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data
Stars: ✭ 59 (+156.52%)
Mutual labels:  data-augmentation
quran-align
Word-accurate timestamps for Qur'anic audio.
Stars: ✭ 139 (+504.35%)
Mutual labels:  speech-recognition
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (-8.7%)
Mutual labels:  speech-recognition
traj-pred-irl
Official implementation codes of "Regularizing neural networks for future trajectory prediction via IRL framework"
Stars: ✭ 23 (+0%)
Mutual labels:  regularization
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+873.91%)
Mutual labels:  speech-recognition
speech to text
how to use the Google Cloud Speech API to transcribe audio/video files.
Stars: ✭ 35 (+52.17%)
Mutual labels:  speech-recognition
lightning-asr
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.
Stars: ✭ 36 (+56.52%)
Mutual labels:  speech-recognition

Introduction

This repository contains Kaldi-compatible implementation of the mixup technique presented in the Interspeech 2018 paper "An Investigation of Mixup Training Strategies for Acoustic Models in ASR".

If you use this code for your research, please cite our paper:

@inproceedings{Medennikov_mixup2018,
  author={Ivan Medennikov and Yuri Khokhlov and Aleksei Romanenko and Dmitry Popov and Natalia Tomashenko and Ivan Sorokin and Alexander Zatvornitskiy},
  title={An Investigation of Mixup Training Strategies for Acoustic Models in ASR},
  year=2020,
  booktitle={Proc. Interspeech 2018},
  pages={2903--2907},
  doi={10.21437/Interspeech.2018-2191},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2191}
}

If you have any questions on the paper or this implementation, please contact to corresponding author Ivan Medennikov ([email protected]).

Licence

Apache 2.0

How to use

Utilities nnet3-mixup-egs and nnet3-chain-mixup-egs are intended to be used instead of nnet3-copy-egs and nnet3-chain-copy-egs in Kaldi training scripts. In order to use mixup utilities you should replace nnet3-copy-egs and/or nnet3-chain-copy-egs here

common.py, rev. eacf34a85ab7ece6a76bd73b9443bc2fe62ac6f1

method train_new_models(), line ~122

ark,bg:nnet3-copy-egs {frame_opts} {multitask_egs_opts}

with

ark,bg:nnet3-mixup-egs {frame_opts} {multitask_egs_opts}

and here

acoustic_model.py, rev. bba22b58407a3243e3fa847986753266e122d015

method train_new_models(), line ~199

ark,bg:nnet3-chain-copy-egs {multitask_egs_opts}

with

ark,bg:nnet3-chain-mixup-egs {multitask_egs_opts}

respectively.

Installation guide

Prerequisites

Install boost

$ sudo apt-get install libboost-all-dev

Install CMake

$ sudo apt-get install cmake

Install git

$ sudo apt-get install git

Building project

Clone mixup project repository

$ git clone https://github.com/speechpro/mixup.git

$ cd mixup

Clone Kaldi submodule

$ git submodule init

$ git submodule update

Build Kaldi dependencies

$ cd kaldi/tools

$ make

or if you want to speedup the building process run:

$ make -j $(nproc)

In case of errors or if you want to check the prerequisites for Kaldi see INSTALL file.

Build Kaldi

$ cd ../src

$ ./configure --shared

$ make depend -j $(nproc)

$ make -j $(nproc)

In case of errors or for additinal building options see INSTALL file.

Generate mixup project

$ cd ../..

$ mkdir build

$ cd build

$ cmake ..

Build mixup modules

$ make -j $(nproc)

Install mixup modules

$ make install

This operation will place mixup modules in to the corresponding Kaldi binary folders.

You may need to add line

export LD_LIBRARY_PATH=$KALDI_ROOT/src/lib:$KALDI_ROOT/tools/openfst/lib:$LD_LIBRARY_PATH

to your path.sh.

Program options

Mixup utilities have a number of parameters and modes of operation. In order to simplify their embedding all parameters can be passed in two equivalent ways: as command line program options and as environment variables.

You can find detailed explanation of the parameters and investigation of the mixup effectiveness in various operation modes in [1].

nnet3-mixup-egs

Command line Environment variable Allowable values Default Meaning
--mix-mode MIXUP_MIX_MODE local, global, class, shift global Mixup mode
--distrib MIXUP_DISTRIB uniform:min,max, beta:alpha, beta2:alpha uniform:0.0,0.5 Mixup scaling factors distribution
--transform MIXUP_TRANSFORM "", sigmoid:k "" Mixup scaling factor transform function for labels
--min-num MIXUP_MIN_NUM integer > 0 1 Minimum number of admixtures
--max-num MIXUP_MAX_NUM integer >= min-num 1 Maximum number of admixtures
--min-shift MIXUP_MIN_SHIFT integer > 0 1 Minimum sequence shift size (shift mode)
--max-shift MIXUP_MAX_SHIFT integer >= min-shift 3 Maximum sequence shift size (shift mode)
--fixed-egs MIXUP_FIXED_EGS float in the range [0, 1] 0.1 Portion of examples to leave untouched
--fixed-frames MIXUP_FIXED_FRAMES float in the range [0, 1] 0.1 Portion of frames to leave untouched
--left-range MIXUP_LEFT_RANGE integer > 0 3 Left range to pick an admixture frame (local mode)
--right-range MIXUP_RIGHT_RANGE integer > 0 3 Right range to pick an admixture frame (local mode)
--buff-size MIXUP_BUFF_SIZE integer > 0 500 Buffer size for data shuffling (global mode)
--compress MIXUP_COMPRESS 0, 1 0 Compress features and i-vectors

nnet3-chain-mixup-egs

Command line Environment variable Allowable values Default Meaning
--mix-mode MIXUP_MIX_MODE global, shift global Mixup mode
--distrib MIXUP_DISTRIB uniform:min,max, beta:alpha, beta2:alpha uniform:0.0,0.5 Mixup scaling factors distribution*
--scale-fst-algo MIXUP_SCALE_FST_ALGO "", default[:scale[,eps]], balanced[:scale[,eps]] "" Scale supervision FSTs algorithm**
--swap-scales MIXUP_SWAP_SCALES true, false false Swap supervision FST scales
--max-super MIXUP_MAX_SUPER true, false false Get supervision from example with maximum scale
--min-shift MIXUP_MIN_SHIFT integer > 0 1 Minimum sequence shift size (shift mode)
--max-shift MIXUP_MAX_SHIFT integer >= min-shift 3 Maximum sequence shift size (shift mode)
--fixed MIXUP_FIXED float in the range [0, 1] 0.1 The portion of the data to leave untouched
--buff-size MIXUP_BUFF_SIZE integer > 0 500 Buffer size for data shuffling (global mode)
--frame-shift MIXUP_FRAME_SHIFT integer >= 0 0 Allows you to shift time values in the supervision data (excluding iVector data) - useful in augmenting data. Note, the outputs will remain at the closest exact multiples of the frame subsampling
--compress MIXUP_COMPRESS 0, 1 0 Compress features and i-vectors

* Mixup scaling factors distribution. In case of --distrib=beta:alpha we use the standard beta probability distribution with symmetric shape (β=α). But when --distrib=beta2:alpha we use modified beta distribution: if sampled value ρ greater 0.5 we use (1-ρ).

float RandomScaleBeta2::Value() {
    const float value = (*distrib)(rand_gen);
    if (value <= 0.5) {
        return value;
    } else {
        return (1.0 - value);
    }
}

** Scale supervision FSTs algorithm. When merging supervision FSTs we apply epsilon restriction as folows. If scaling factor less eps we leave example FST unchanged. If 1.0 minus scaling factor less eps we use admixture FST instead of fusion. Default value of eps is 0.001.

void ExampleMixer::FuseGraphs(const fst_t& _admixture, float _admx_scale, fst_t& _example) const {
    if (_admx_scale < scale_eps) {
        return;
    } else if ((1.0 - _admx_scale) < scale_eps) {
        _example = _admixture;
        return;
    }
    ...
    ...
}

References

[1] Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin, Alexander Zatvornitskiy, "An investigation of mixup training strategies for acoustic models in ASR", Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), 2018

[2] Tomashenko, N., Khokhlov, Y., Estève, Y. (2018) Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition. Proc. Interspeech 2018, ‎2414-2418, DOI: 10.21437/Interspeech.‎2018-2209

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].