pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+9885.71%)

Mutual labels: speech-recognition, kaldi, asr

Zeroth

Kaldi-based Korean ASR (한국어 음성인식) open-source project

Stars: ✭ 248 (+1080.95%)

Mutual labels: speech-recognition, kaldi, asr

Py Kaldi Asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.

Stars: ✭ 156 (+642.86%)

Mutual labels: speech-recognition, kaldi, asr

rustfst

Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.

Stars: ✭ 104 (+395.24%)

Mutual labels: speech-recognition, kaldi, asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+752.38%)

Mutual labels: speech-recognition, speech-to-text, asr

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (+152.38%)

Mutual labels: speech-recognition, speech-to-text, asr

View All Similar Projects ➔

kaldi-long-audio-alignment

Long audio alignment using Kaldi. This tool splits a long audio and the corresponding transcript into multiple segments such that the transcripts for smaller segment correspond to the small audio segment. It is useful in ASR training since the small segments take much lesser total time compared to using the entire audio at once.

The algorithm is similar to the one in SAILALIGN toolkit (https://github.com/nassosoassos/sail_align).

Refer to "A RECURSIVE ALGORITHM FOR THE FORCED ALIGNMENT OF VERY LONG AUDIO SEGMENTS" and "A SYSTEM FOR AUTOMATIC ALIGNMENT OF BROADCAST MEDIA CAPTIONS USING WEIGHTED FINITE-STATE TRANSDUCERS" to get started.

NOTE: Adaptation after each pass has not been implemented yet.

License: Apache License 2.0

Copyright: Speech Lab (of Prof. S Umesh), EE department, IIT Madras

Overview of the tool

Performs long audio alignment and optionally appends the segmented data to train set.

The input, among others, is a directory containing only one audio file i.e wav.scp, utt2spk, spk2utt and text (with the key as "key_1") have only one entry.

There are two top level scripts, longaudio_multi_dir.sh and longaudio_alignment.sh.

longaudio_multi_dir.sh can be used if there are several audio files (and hence several directories) and/or if you want to append the segmented long audio to the train data. However, I am not going to explain this script now since I think this usecase could be rare.

Running longaudio_alignment.sh

Step 1: path.sh, cmd.sh, etc. are needed as you would for running any kaldi experiment.

Step 2: Create a file named test_dir_location in the data directory and add the "path_to_test_directory" e.g: echo "test_may2015" > data/test_dir_location

Step 3: Change longaudio_vars.sh to set the path of your directories.

Step 4: longaudio_alignment.sh takes 3 arguments.

--working-dir - the directory where temporary files are placed

--stage - takes two values. --stage 1 means only iter0 and --stage 2 means additional n-1 iterations (n is specified in longaudio_vars.sh) are performed.

--create-dir - takes true or false. If true, creates a new data folder containing the segments file.

example: ./longaudio_alignment.sh --stage 1 --working-dir data/working_dir_may2015/ --create-dir true

Note: Iterations 0 to n-3 use trigram and iterations n-2 and n-1 are the two passes described in [2] but with a difference. the LM is built only on the exact text which corresponds to the segment rather than from a longer context hence larger deletions are still a problem.

[1]: "A RECURSIVE ALGORITHM FOR THE FORCED ALIGNMENT OF VERY LONG AUDIO SEGMENTS"

[2]: "A SYSTEM FOR AUTOMATIC ALIGNMENT OF BROADCAST MEDIA CAPTIONS USING WEIGHTED FINITE-STATE TRANSDUCERS"

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

srinivr / kaldi-long-audio-alignment

Programming Languages

Labels

Projects that are alternatives of or similar to kaldi-long-audio-alignment

kaldi-long-audio-alignment

Overview of the tool

Running longaudio_alignment.sh