All Projects → daanzu → kaldi_ag_training

daanzu / kaldi_ag_training

Licence: AGPL-3.0 license
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Programming Languages

shell
77523 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kaldi ag training

Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+79550%)
Mutual labels:  speech, speech-recognition, speech-to-text, kaldi
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (+2707.14%)
Mutual labels:  speech, speech-recognition, speech-to-text, kaldi
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+1178.57%)
Mutual labels:  speech, speech-recognition, speech-to-text
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+1364.29%)
Mutual labels:  speech, speech-recognition, speech-to-text
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (+50%)
Mutual labels:  speech, speech-recognition, speech-to-text
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (+392.86%)
Mutual labels:  speech, speech-recognition, speech-to-text
Deepspeech
A PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+8607.14%)
Mutual labels:  speech, speech-recognition, speech-to-text
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+814.29%)
Mutual labels:  speech, speech-recognition, speech-to-text
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+14878.57%)
Mutual labels:  speech, speech-recognition, kaldi
Lingvo
Lingvo
Stars: ✭ 2,361 (+16764.29%)
Mutual labels:  speech, speech-recognition, speech-to-text
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+1364.29%)
Mutual labels:  speech, speech-recognition, speech-to-text
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (+307.14%)
Mutual labels:  speech, speech-recognition, speech-to-text
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (+150%)
Mutual labels:  speech, speech-recognition, speech-to-text
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (+785.71%)
Mutual labels:  speech, speech-recognition, kaldi
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+5300%)
Mutual labels:  speech, speech-recognition, kaldi
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+44300%)
Mutual labels:  speech, speech-recognition, speech-to-text
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (+278.57%)
Mutual labels:  speech, speech-recognition, speech-to-text
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (+3400%)
Mutual labels:  speech, speech-recognition, speech-to-text
Sonus
💬 /so.nus/ STT (speech to text) for Node with offline hotword detection
Stars: ✭ 532 (+3700%)
Mutual labels:  speech, speech-recognition, speech-to-text
Tacotron asr
Speech Recognition Using Tacotron
Stars: ✭ 165 (+1078.57%)
Mutual labels:  speech, speech-recognition, speech-to-text

Kaldi AG Training Setup

Donate Donate Donate

Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Usage

All commands are run in the Docker container as follows. Training on the CPU should work, just much more slowly. To do so, remove the --runtime=nvidia and use the image daanzu/kaldi_ag_training:2020-11-28 instead the GPU image. You can run Docker directly with the following parameter structure, or as a shortcut, use the run_docker.sh script (and edit it to suit your needs and configuration).

docker run -it --rm -v $(pwd):/mnt/input -w /mnt/input --user "$(id -u):$(id -g)" \
    --runtime=nvidia daanzu/kaldi_ag_training_gpu:2020-11-28 \
    [command and args...]

Example commands:

# Download and prepare base model (needed for either finetuning or personal model training)
wget https://github.com/daanzu/kaldi_ag_training/releases/download/v0.1.0/kaldi_model_daanzu_20200905_1ep-mediumlm-base.zip
unzip kaldi_model_daanzu_20200905_1ep-mediumlm-base.zip

# Prepare training dataset files
python3 convert_tsv_to_scp.py yourdata.tsv [optional output directory]

# Pick only one of the following:
# Run finetune training, with default settings
bash run_docker.sh bash run.finetune.sh kaldi_model_daanzu_20200905_1ep-mediumlm-base dataset
# Run completely personal training, with default settings
bash run_docker.sh bash run.personal.sh kaldi_model_daanzu_20200905_1ep-mediumlm-base dataset

# When training completes, export trained model
python3 export_trained_model.py {finetune,personal} [optional output directory]
# Finally run the following in your kaldi-active-grammar python environment (will take as much as an hour and several GB of RAM)
python3 -m kaldi_active_grammar compile_agf_dictation_graph -v -m [model_dir]

# Test a new or old model
python3 test_model.py testdata.tsv [model_dir]

Notes

  • To run either training, you must have a base model to use as a template. (For finetuning this is also the starting point of the model; for personal it is only a source of basic info.) You can use this base model from this project's release page. Download the zip file and extract it to the root directory of this repo, so the directory kaldi_model_daanzu_20200905_1ep-mediumlm-base is here.

  • Kaldi requires the training data metadata to be in the SCP format, which is an annoying multi-file format. To convert the standard KaldiAG TSV format to SCP, you can run python3 convert_tsv_to_scp.py yourdata.tsv dataset to output SCP format in a new directory dataset. You can run these commands within the Docker container, or directly using your own python environment.

    • Even better, run python3 convert_tsv_to_scp.py -l kaldi_model_daanzu_20200905_1ep-mediumlm-base/dict/lexicon.txt yourdata.tsv dataset to filter out utterances containing out-of-vocabulary words. OOV words are not currently well supported by these training scripts.
  • The audio data should be 16-bit Signed Integer PCM 1-channel 16kHz WAV files. Note that it needs to be accessible within the Docker container, so it can't be behind a symlink that points outside this repo directory, which is shared with the Docker container.

  • There are some directory names you should avoid using in this repo directory, because the scripts will create & use them during training. Avoid: conf, data, exp, extractor, mfcc, steps, tree_sp, utils.

  • Training may use a lot of storage. You may want to locate this directory somewhere with ample room available.

  • The training commands (run.*.sh) accept many optional parameters. More info later.

    • --stage n : Skip to given stage.
    • --num-utts-subset 3000 : You may need this parameter to prevent an error at the beginning of nnet training if your training data contains many short (command-like) utterances. (3000 is a perhaps overly careful suggestion; 300 is the default value.)
  • I decided to try to treat the docker image as evergreen, and keep the things liable to change a lot like scripts in the git repo instead.

  • The format of the training dataset input .tsv file is of tab-separated-values fields as follows: wav_filename ignored ignored ignored text_transcript

Related Repositories

  • daanzu/speech-training-recorder: Simple GUI application to help record audio dictated from given text prompts, for use with training speech recognition or speech synthesis.
  • daanzu/kaldi-active-grammar: Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time.

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE file for details. If this license is problematic for you, please contact me.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].