Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.

Stars: ✭ 50 (-72.53%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

Self Supervised Speech Recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework

Stars: ✭ 106 (-41.76%)

Mutual labels: speech-recognition, speech-to-text, semi-supervised-learning

Speech To Text Benchmark

speech to text benchmark framework

Stars: ✭ 481 (+164.29%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

octopus

On-device speech-to-index engine powered by deep learning.

Stars: ✭ 30 (-83.52%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

spokestack-ios

Spokestack: give your iOS app a voice interface!

Stars: ✭ 27 (-85.16%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+362.09%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

Voice Overlay Ios

🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI

Stars: ✭ 440 (+141.76%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

Rhino

On-device speech-to-intent engine powered by deep learning

Stars: ✭ 406 (+123.08%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

react-native-spokestack

Spokestack: give your React Native app a voice interface!

Stars: ✭ 53 (-70.88%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

Nativescript Speech Recognition

💬 Speech to text, using the awesome engines readily available on the device.

Stars: ✭ 72 (-60.44%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

leopard

On-device speech-to-text engine powered by deep learning

Stars: ✭ 354 (+94.51%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

KeenASR-Android-PoC

A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html

Stars: ✭ 21 (-88.46%)

Mutual labels: voice-recognition, speech-recognition, speech-to-text

Voice Overlay Android

🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI

Stars: ✭ 189 (+3.85%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

Cheetah

On-device streaming speech-to-text engine powered by deep learning

Stars: ✭ 383 (+110.44%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

Sonus

💬 /so.nus/ STT (speech to text) for Node with offline hotword detection

Stars: ✭ 532 (+192.31%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+645.6%)

Mutual labels: speech-recognition, speech-to-text, voice-recognition

Wav2letter.pytorch

A fully convolution-network for speech-to-text, built on pytorch.

Stars: ✭ 104 (-42.86%)

Mutual labels: speech-recognition, speech-to-text

Hey Jetson

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Stars: ✭ 161 (-11.54%)

Mutual labels: speech-recognition, speech-to-text

View All Similar Projects ➔

For Kaldi API for Android and Linux please see Vosk API. This is a server project.

This is Vosk, the lifelong speech recognition system.

Concepts

As of 2019, the neural network based speech recognizers are pretty limited in terms of amount of the speech data they can use in training and require enormous computing power and time to train and optimize the parameters. Neural networks have problems with human-like one shot learning, their decisions are not very robust to unseen conditions and hard to understand and correct.

That is why we decided to build a system based on large signal database concept. We apply audio fingerprinting scheme. The audio is segmented on chunks, the chunks are stored in the database based on LSH hash value. During decoding we simply lookup the chunks in the database to get the idea what are the possible phones. That helps us to make a proper decision on decoding results.

The advantages of this approach are:

We can quickly train on 100000 hours of speech data on very simple hardware
We can easily correct recognizer behavior just by adding samples
We can make sure that recognition result is correct because it is sufficiently represented in the training dataset
We can parallelize training across thousands of nodes
We support lifelong learning paradigm
We can use this method together with more common neural network training to improve recognition accuracy
The system is robust against noise

The disandvantages are:

The index is really huge, it is not expected to fit a memory of single server
The generalization capabilities of the model are quite questionable, at the same time the generalization capabilities of the neural networks are also questionable.
For now the segmentation requires conventional ASR, but in the future we might segment ourselves.

The nice to have things in the future would be:

Multilingual training
Our own segmentation
The tool to reduce the model to fit the mobile
Specialized hardware to implement this AI paradigm

Usage

To install the requirements run

pip3 install -r requirements.txt

To prepare the training/verification data create the following two files:

wav.scp list to map uterances to wav files in filesystem
phones.txt the CTM file with phonemes and timings. It could be CTM file from the alignment or it could be a CTM file from the decoding

You can create them with Kaldi ASR toolkit

Indexing

To add the data to the database run

python3 index.py wavs-train.txt phones-train.txt data.idx

That will add the data to the database data.idx or create a new one

Verification

To verify decoding results run

python3 verify.py wavs-test.txt phones-test.txt data.idx

The tool will search for segments in the index and report suspicious segments which you can additionally check and later add to the database to improve the accuracy of recognition.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

alphacep / Vosk

Programming Languages

Labels

Projects that are alternatives of or similar to Vosk

For Kaldi API for Android and Linux please see Vosk API. This is a server project.

Concepts

Usage

Indexing

Verification

Related papers and links