All Projects → daanzu → Kaldi Active Grammar

daanzu / Kaldi Active Grammar

Licence: agpl-3.0
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kaldi Active Grammar

Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+592.35%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-92.86%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-22.96%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
speech-to-text
mixlingual speech recognition system; hybrid (GMM+NNet) model; Kaldi + Keras
Stars: ✭ 61 (-68.88%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+276.53%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Awesome Kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Stars: ✭ 393 (+100.51%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-89.29%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+471.43%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+5589.29%)
Mutual labels:  speech-recognition, speech-to-text, kaldi
Deepspeech Server
A testing server for a speech to text service based on mozilla deepspeech
Stars: ✭ 176 (-10.2%)
Mutual labels:  speech-recognition, speech-to-text
Py Kaldi Asr
Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.
Stars: ✭ 156 (-20.41%)
Mutual labels:  speech-recognition, kaldi
Voice Overlay Android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 189 (-3.57%)
Mutual labels:  speech-recognition, speech-to-text
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-25.51%)
Mutual labels:  speech-recognition, speech-to-text
Kaldi Onnx
Kaldi model converter to ONNX
Stars: ✭ 174 (-11.22%)
Mutual labels:  speech-recognition, kaldi
Speechrecognizerbutton
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Stars: ✭ 144 (-26.53%)
Mutual labels:  speech-recognition, speech-to-text
Tensorflow Speech Recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Stars: ✭ 2,118 (+980.61%)
Mutual labels:  speech-recognition, speech-to-text
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (-12.76%)
Mutual labels:  speech-recognition, speech-to-text
Go Astideepspeech
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Stars: ✭ 137 (-30.1%)
Mutual labels:  speech-recognition, speech-to-text
Kaldiio
A pure python module for reading and writing kaldi ark files
Stars: ✭ 160 (-18.37%)
Mutual labels:  speech-recognition, kaldi
Automatic Speech Recognition
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Stars: ✭ 192 (-2.04%)
Mutual labels:  speech-recognition, speech-to-text

Kaldi Active Grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

Python package developed to enable context-based command & control of computer applications, as in the Dragonfly speech recognition framework, using the Kaldi automatic speech recognition engine.

PyPI - Version PyPI - Python Version PyPI - Wheel PyPI - Downloads GitHub - Downloads

Batteries-Included Continuous Integration Gitter

Donate Donate Donate Donate [GitHub is matching (only) my GitHub Sponsors donations.]

Normally, Kaldi decoding graphs are monolithic, require expensive up-front off-line compilation, and are static during decoding. Kaldi's new grammar framework allows multiple independent grammars with nonterminals, to be compiled separately and stitched together dynamically at decode-time, but all the grammars are always active and capable of being recognized.

This project extends that to allow each grammar/rule to be independently marked as active/inactive dynamically on a per-utterance basis (set at the beginning of each utterance). Dragonfly is then capable of activating only the appropriate grammars for the current environment, resulting in increased accuracy due to fewer possible recognitions. Furthermore, the dictation grammar can be shared between all the command grammars, which can be compiled quickly without needing to include large-vocabulary dictation directly.

Features

  • Binaries: The Python package includes all necessary binaries for decoding on Windows/Linux/MacOS. Available on PyPI.
    • Binaries are generated from my fork of Kaldi, which is only intended to be used by kaldi-active-grammar directly, and not as a stand-alone library.
  • Pre-trained model: A compatible general English Kaldi nnet3 chain model is trained on ~3000 hours of open audio. Available under project releases.
  • Plain dictation: Do you just want to recognize plain dictation? Seems kind of boring, but okay! There is an interface for plain dictation (see below), using either your specified HCLG.fst file, or KaldiAG's included pre-trained dictation model.
  • Dragonfly/Caster: A compatible backend for Dragonfly is under development in the kaldi branch of my fork, and has been merged as of Dragonfly v0.15.0.
    • See its documentation, try out a demo, or use the loader to run all normal dragonfly scripts.
    • You can try it out easily on Windows using a simple no-install package: see Getting Started below.
    • Caster is supported as of KaldiAG v0.6.0 and Dragonfly v0.16.1.
  • Bootstrapped since v0.2: development of KaldiAG is done entirely using KaldiAG.

Demo Video

Demo Video

Donations are appreciated to encourage development.

Donate Donate Donate Donate [GitHub is currently matching all my donations $-for-$.]

Related Repositories

Getting Started

Want to get started quickly & easily on Windows? Available under project releases:

  • kaldi-dragonfly-winpython: A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-dragonfly-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2. Just unzip and run!
  • kaldi-caster-winpython-dev: [more recent development version] A self-contained, portable, batteries-included (python & libraries & model) distribution of kaldi-active-grammar + dragonfly2 + caster. Just unzip and run!

Otherwise...

Setup

Requirements:

  • Python 2.7 or 3.6+; 64-bit required!
  • OS: Windows/Linux/MacOS all supported
  • Only supports Kaldi left-biphone models, specifically nnet3 chain models, with specific modifications
  • ~1GB+ disk space for model plus temporary storage and cache, depending on your grammar complexity
  • ~1GB+ RAM for model and grammars, depending on your model and grammar complexity

Installation:

  1. Download compatible generic English Kaldi nnet3 chain model from project releases. Unzip the model and pass the directory path to kaldi-active-grammar constructor.
    • Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.
  2. Install Python package, which includes necessary Kaldi binaries:
    • The easy way to use kaldi-active-grammar is as a backend to dragonfly, which makes it easy to define grammars and resultant actions.
    • Alternatively, if you only want to use it directly (via a more low level interface), you can just run pip install kaldi-active-grammar

Troubleshooting

  • Errors installing
    • Make sure you're using a 64-bit Python.
    • You should install via pip install kaldi-active-grammar (directly or indirectly), not python setup.py install, in order to get the required binaries.
    • Update your pip (to at least 19.0+) by executing python -m pip install --upgrade pip, to support the required python binary wheel package.
  • Errors running
    • Windows: The code execution cannot proceed because VCRUNTIME140.dll was not found. (or similar)
      • You must install the VC2017+ redistributable from Microsoft: download page, direct link. (This is usually already installed globally by other programs.)
    • Try deleting the Kaldi model .tmp directory, and re-running.
    • Try deleting the Kaldi model directory itself, re-downloading and/or re-extracting it, and re-running. (Note: You may want to make a copy of your user_lexicon.txt file before deleting, to put in the new model directory.)
  • For reporting issues, try running with import logging; logging.basicConfig(level=1) at the top of your main/loader file to enable full debugging logging.

Documentation

Formal documentation is somewhat lacking currently. To see example usage, examine:

  • Plain dictation interface: Set up recognizer for plain dictation; perform decoding on given wav file.
  • Full example: Set up grammar compiler & decoder; set up a rule; perform decoding on live, real-time audio from microphone.
  • Backend for Dragonfly: Many advanced features and complex interactions.

The KaldiAG API is fairly low level, but basically: you define a set of grammar rules, then send in audio data, along with a bit mask of which rules are active at the beginning of each utterance, and receive back the recognized rule and text. The easy way is to go through Dragonfly, which makes it easy to define the rules, contexts, and actions.

Building

  • Recommendation: use the binary wheels distributed for all major platforms.
    • Significant work has gone into allowing you to avoid the many repo/dependency downloads, GBs of disk space, and vCPU-hours needed for building from scratch.
    • They are built in public by automated Continuous Integration run on GitHub Actions: see manifest.
  • Alternatively, to build for use locally:
    • Linux/MacOS:
      1. Install Intel Math Kernel Library
      2. python -m pip install -r requirements-build.txt
      3. python setup.py bdist_wheel (see CMakeLists.txt for details)
    • Windows:
      • Less easily generally automated
      • You can follow the steps for Continuous Integration run on GitHub Actions: see the build-windows section of the manifest.
  • Note: the project (and python wheel) is built from a duorepo (2 separate repos used together):
    1. This repo, containing the external interface and higher-level logic, written in Python.
    2. My fork of Kaldi, containing the lower-level code, written in C++.

Contributing

Issues, suggestions, and feature requests are welcome & encouraged. Pull requests are considered, but project structure is in flux.

Donations are appreciated to encourage development.

Donate Donate Donate Donate [GitHub is currently matching all my donations $-for-$.]

Author

License

This project is licensed under the GNU Affero General Public License v3 (AGPL-3.0-or-later). See the LICENSE.txt file for details. If this license is problematic for you, please contact me.

Acknowledgments

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].