All Projects → sovaai → sova-asr

sovaai / sova-asr

Licence: Apache-2.0 License
SOVA ASR (Automatic Speech Recognition)

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
HTML
75241 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to sova-asr

demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-82.11%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+187.8%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, stt, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-43.9%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-53.66%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-82.93%)
Mutual labels:  speech, speech-recognition, stt, asr
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (-27.64%)
Mutual labels:  speech, speech-recognition, speech-to-text, stt
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-82.93%)
Mutual labels:  speech-recognition, automatic-speech-recognition, speech-to-text, asr
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-79.67%)
Mutual labels:  speech-recognition, speech-to-text, stt, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+1819.51%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+66.67%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+4.07%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+45.53%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+66.67%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (-33.33%)
Mutual labels:  speech, speech-recognition, speech-to-text, stt
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-82.93%)
Mutual labels:  speech-recognition, speech-to-text, asr
wave2vec-recognize-docker
Wave2vec 2.0 Recognize pipeline
Stars: ✭ 30 (-75.61%)
Mutual labels:  automatic-speech-recognition, asr, wav2letter
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-82.93%)
Mutual labels:  speech, speech-recognition, speech-to-text
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-88.62%)
Mutual labels:  speech, speech-recognition, speech-to-text
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-88.62%)
Mutual labels:  speech-recognition, speech-to-text, stt
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+583.74%)
Mutual labels:  speech-recognition, speech-to-text, stt

SOVA ASR

SOVA ASR is a fast speech recognition solution based on Wav2Letter architecture. It is designed as a REST API service and it can be customized (both code and models) for your needs.

Installation

The easiest way to deploy the service is via docker-compose, so you have to install Docker and docker-compose first. Here's a brief instruction for Ubuntu:

Docker installation

  • Install Docker:
$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io
$ sudo usermod -aG docker $(whoami)

In order to run docker commands without sudo you might need to relogin.

  • Install docker-compose:
$ sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
  • (Optional) If you're planning on using CUDA run these commands:
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
$ sudo apt-get install nvidia-container-runtime

Add the following content to the file /etc/docker/daemon.json:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

Restart the service:

$ sudo systemctl restart docker.service

Build and deploy

In order to run service with pretrained models you will have to download http://dataset.sova.ai/SOVA-ASR/data.tar.gz.

  • Clone the repository, download the pretrained models archive and extract the contents into the project folder:
$ git clone --recursive https://github.com/sovaai/sova-asr.git
$ cd sova-asr/
$ wget http://dataset.sova.ai/SOVA-ASR/data.tar.gz
$ tar -xvf data.tar.gz && rm data.tar.gz
  • Build docker image

    • If you're planning on using GPU (it is required for training and can be used for inference): build sova-asr image using the following command:
    $ sudo docker-compose build
    • If you're planning on using CPU only: modify Dockerfile, docker-compose.yml (remove the runtime and environment sections) and config.ini (cpu should be set to 0) and build sova-asr image:
    $ sudo docker-compose build
  • Run web service in a docker container

    $ sudo docker-compose up -d sova-asr

Testing

To test the service you can send a POST request:

$ curl --request POST 'http://localhost:8888/asr' --form 'audio_blob=@"data/test.wav"'

Finetuning acoustic model

If you want to finetune the acoustic model you can set hyperparameters and paths to your own train and validation manifest files and run the training service.

  • Set training options in Train section of config.ini. Train and validation csv manifest files should contain comma-separated audio file paths and reference texts in each line. For instance:
    data/audio/000000.wav,добрый день
    data/audio/000001.wav,как ваши дела
    ...
  • Run training in docker container:
    $ sudo docker-compose up -d sova-asr-train

Customizations

If you want to train your own acoustic model refer to PuzzleLib tutorials. Check KenLM documentation for building your own language model. This repository was tested on Ubuntu 18.04 and has pre-built .so Trie decoder files for Python 3.6 running inside the Docker container, for modifications you can get your own .so files using Wav2Letter++ code for building Python bindings. Otherwise you can use a standard Greedy decoder (set in config.ini).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].