All Projects → srinivr → kaldi-long-audio-alignment

srinivr / kaldi-long-audio-alignment

Licence: Apache-2.0 license
Long audio alignment using Kaldi

Programming Languages

shell
77523 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kaldi-long-audio-alignment

leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+1585.71%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, transcription, asr
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (+619.05%)
Mutual labels:  speech-recognition, speech-to-text, kaldi, asr
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-38.1%)
Mutual labels:  automatic-speech-recognition, speech-to-text, kaldi, transcription
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (+4.76%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+485.71%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+3414.29%)
Mutual labels:  speech-recognition, speech-to-text, kaldi, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+6361.9%)
Mutual labels:  speech-recognition, speech-to-text, kaldi, asr
Kaldi Active Grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
Stars: ✭ 196 (+833.33%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+876.19%)
Mutual labels:  speech-recognition, speech-to-text, asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (+0%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+876.19%)
Mutual labels:  speech-recognition, speech-to-text, asr
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+11252.38%)
Mutual labels:  speech-recognition, automatic-speech-recognition, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (+0%)
Mutual labels:  speech-recognition, speech-to-text, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+11142.86%)
Mutual labels:  speech-recognition, speech-to-text, asr
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+9885.71%)
Mutual labels:  speech-recognition, kaldi, asr
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (+1080.95%)
Mutual labels:  speech-recognition, kaldi, asr
Py Kaldi Asr
Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.
Stars: ✭ 156 (+642.86%)
Mutual labels:  speech-recognition, kaldi, asr
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+395.24%)
Mutual labels:  speech-recognition, kaldi, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+752.38%)
Mutual labels:  speech-recognition, speech-to-text, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+152.38%)
Mutual labels:  speech-recognition, speech-to-text, asr

kaldi-long-audio-alignment

Long audio alignment using Kaldi. This tool splits a long audio and the corresponding transcript into multiple segments such that the transcripts for smaller segment correspond to the small audio segment. It is useful in ASR training since the small segments take much lesser total time compared to using the entire audio at once.

The algorithm is similar to the one in SAILALIGN toolkit (https://github.com/nassosoassos/sail_align).

Refer to "A RECURSIVE ALGORITHM FOR THE FORCED ALIGNMENT OF VERY LONG AUDIO SEGMENTS" and "A SYSTEM FOR AUTOMATIC ALIGNMENT OF BROADCAST MEDIA CAPTIONS USING WEIGHTED FINITE-STATE TRANSDUCERS" to get started.

NOTE: Adaptation after each pass has not been implemented yet.

License: Apache License 2.0

Copyright: Speech Lab (of Prof. S Umesh), EE department, IIT Madras

Overview of the tool

Performs long audio alignment and optionally appends the segmented data to train set.

The input, among others, is a directory containing only one audio file i.e wav.scp, utt2spk, spk2utt and text (with the key as "key_1") have only one entry.

There are two top level scripts, longaudio_multi_dir.sh and longaudio_alignment.sh.

longaudio_multi_dir.sh can be used if there are several audio files (and hence several directories) and/or if you want to append the segmented long audio to the train data. However, I am not going to explain this script now since I think this usecase could be rare.

Running longaudio_alignment.sh

Step 1: path.sh, cmd.sh, etc. are needed as you would for running any kaldi experiment.

Step 2: Create a file named test_dir_location in the data directory and add the "path_to_test_directory" e.g: echo "test_may2015" > data/test_dir_location

Step 3: Change longaudio_vars.sh to set the path of your directories.

Step 4: longaudio_alignment.sh takes 3 arguments.

--working-dir - the directory where temporary files are placed

--stage - takes two values. --stage 1 means only iter0 and --stage 2 means additional n-1 iterations (n is specified in longaudio_vars.sh) are performed.

--create-dir - takes true or false. If true, creates a new data folder containing the segments file.

example: ./longaudio_alignment.sh --stage 1 --working-dir data/working_dir_may2015/ --create-dir true

Note: Iterations 0 to n-3 use trigram and iterations n-2 and n-1 are the two passes described in [2] but with a difference. the LM is built only on the exact text which corresponds to the segment rather than from a longer context hence larger deletions are still a problem.

[1]: "A RECURSIVE ALGORITHM FOR THE FORCED ALIGNMENT OF VERY LONG AUDIO SEGMENTS"

[2]: "A SYSTEM FOR AUTOMATIC ALIGNMENT OF BROADCAST MEDIA CAPTIONS USING WEIGHTED FINITE-STATE TRANSDUCERS"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].