All Projects → snakers4 → Silero Models

snakers4 / Silero Models

Licence: agpl-3.0
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple

Projects that are alternatives of or similar to Silero Models

PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-95.98%)
Mutual labels:  speech-recognition, speech-to-text, pretrained-models, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-89.85%)
Mutual labels:  speech-recognition, speech-to-text, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-41.57%)
Mutual labels:  speech-recognition, speech-to-text, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (-26.63%)
Mutual labels:  speech-recognition, speech-to-text, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (-32.18%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (-60.73%)
Mutual labels:  speech-recognition, speech-to-text, asr
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (-24.9%)
Mutual labels:  jupyter-notebook, speech-recognition, asr
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (-69.16%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-95.98%)
Mutual labels:  speech-recognition, speech-to-text, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-94.83%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (-90.04%)
Mutual labels:  speech-recognition, speech-to-text, asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-95.98%)
Mutual labels:  speech-recognition, speech-to-text, asr
Speech recognition with tensorflow
Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.
Stars: ✭ 253 (-51.53%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-65.71%)
Mutual labels:  speech-recognition, speech-to-text, asr
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+605.94%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-95.21%)
Mutual labels:  speech-recognition, speech-to-text, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+352.3%)
Mutual labels:  speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (-60.73%)
Mutual labels:  speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-76.44%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-96.17%)
Mutual labels:  speech-recognition, speech-to-text, asr

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Open on Torch Hub Open on TF Hub

Open In Colab

header)

Silero Models

Silero Models: pre-trained enterprise-grade STT models and benchmarks. Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Getting Started

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following checkpoints:

PyTorch ONNX TensorFlow Quantization Quality Colab
English (en_v2) ✔️ ✔️ ✔️ ⌛️ link Open In Colab
German (de_v1) ✔️ ✔️ ✔️ ⌛️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ ✔️ ⌛️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ⌛️ ✔️ N/A Open In Colab

Dependencies

  • All examples:
    • torch (used to clone the repo in tf and onnx examples)
    • torchaudio
    • soundfile
    • omegaconf
  • Additional for ONNX examples:
    • onnx
    • onnxruntime
  • Additional for TensorFlow examples:
    • tensorflow
    • tensorflow_hub

Please see the provided Colab for details for each example below.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav') 
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

Open on TF Hub

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].