All Projects → Franck-Dernoncourt → Asr_benchmark

Franck-Dernoncourt / Asr_benchmark

Program to benchmark various speech recognition APIs

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Asr benchmark

Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+1811.27%)
Mutual labels:  speech-recognition, asr, voice-recognition
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-61.97%)
Mutual labels:  voice-recognition, speech-recognition, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+398.59%)
Mutual labels:  voice-recognition, speech-recognition, asr
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-26.76%)
Mutual labels:  voice-recognition, speech-recognition, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-25.35%)
Mutual labels:  voice-recognition, speech-recognition, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+439.44%)
Mutual labels:  speech-recognition, asr, voice-recognition
Speech To Text Benchmark
speech to text benchmark framework
Stars: ✭ 481 (+577.46%)
Mutual labels:  speech-recognition, voice-recognition
Mycroft Precise
A lightweight, simple-to-use, RNN wake word listener
Stars: ✭ 481 (+577.46%)
Mutual labels:  speech-recognition, voice-recognition
Sonus
💬 /so.nus/ STT (speech to text) for Node with offline hotword detection
Stars: ✭ 532 (+649.3%)
Mutual labels:  speech-recognition, voice-recognition
Libreasr
💬 An On-Premises, Streaming Speech Recognition System
Stars: ✭ 633 (+791.55%)
Mutual labels:  speech-recognition, asr
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (+452.11%)
Mutual labels:  speech-recognition, asr
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+663.38%)
Mutual labels:  speech-recognition, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+939.44%)
Mutual labels:  speech-recognition, asr
Voice Overlay Ios
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 440 (+519.72%)
Mutual labels:  speech-recognition, voice-recognition
Rhino
On-device speech-to-intent engine powered by deep learning
Stars: ✭ 406 (+471.83%)
Mutual labels:  speech-recognition, voice-recognition
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+635.21%)
Mutual labels:  speech-recognition, asr
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+474.65%)
Mutual labels:  speech-recognition, asr
Wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 617 (+769.01%)
Mutual labels:  speech-recognition, asr
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+1038.03%)
Mutual labels:  speech-recognition, asr
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+976.06%)
Mutual labels:  speech-recognition, asr

Speech Recognition Benchmark

Build Status

The Speech Recognition Benchmark is a program that assesses and compares the performances of automated speech recognition (ASR) APIs. It runs on Mac OS X, Microsoft Windows and Ubuntu. It currently supports the following ASR APIs: Amazon Lex, Google, Google Cloud, Houndify, IBM Watson, Microsoft (a.k.a. Bing), Speechmatics and Wit.

Table of Contents

Installation

The Speech Recognition Benchmark requires Python 3, as well as a few Python packages that you may install running pip install -r requirements.txt

The configuration file src/settings.ini contains all the parameters that you may wish to change.

Usage

Run cd src; python benchmark.py

Benchmark results

Below are some benchmark results presenting the word error rates expressed in percentage for several ASR APIs on the following 5 corpora: CV = Common Voice (total length: 4:58:32, divided into 3995 speech files); F = Fotolia (4:28:05, 3184); IER = Image Edit Requests (2:29:09, 1289); LS-c = LibriSpeech clean (1:53:37, 870); LS-o = LibriSpeech other (5:20:29, 2939). These 5 corpora are all in English. For each of these corpora, we only use the official test set.

Important note: CV, LS-c, and LS-o are public corpora so it is very much possible that some ASRs have been trained on it, making the word error rates lower then they should be. On the contrary, F and IER are private corpora. Also, different APIs may differ on how well they handle languages other than English, speaker accents, background noise, etc. Consequently, you may want to perform the benchmark on a corpus that reflects your use case (in which case you are very welcome to share your results here).

ASR API Date CV F IER LS-c LS-o
Human 5.8 12.7
Google 2018-03-30 23.2 24.2 16.6 12.1 28.8
Google Cloud 2018-03-30 23.3 26.3 18.3 12.3 27.3
IBM 2018-03-30 21.8 47.6 24.0 9.8 25.3
Microsoft 2018-03-30 29.1 28.1 23.1 18.8 35.9
Speechmatics 2018-03-30 19.1 38.4 21.4 7.3 19.4
Wit.ai 2018-03-30 35.6 54.2 37.4 19.2 41.7

Corpora

For convenience, we provide two scripts to format the Common Voice and LibriSpeech corpora so that the ASR benchmark can be run on them.

Bash script to format Common Voice (requires ~25 GB disk space):

# cv_corpus_v1.tar.gz is 12 GB. Mirror for the S3 link below: https://archive.org/details/cv_corpus_v1.tar
wget https://common-voice-data-download.s3.amazonaws.com/cv_corpus_v1.tar.gz
tar -xvf cv_corpus_v1.tar.gz
mkdir cv-valid-test
mv cv_corpus_v1/cv-valid-test cv-valid-test
mv cv_corpus_v1/cv-valid-test.csv cv-valid-test
rm -Rf  cv_corpus_v1
rm cv_corpus_v1.tar.gz
cd  ../src
# format_common_voice_gold_transcriptions.py requires the pandas package, which can be installed with: pip install pandas
python format_common_voice_gold_transcriptions.py

Bash script to format LibriSpeech (requires ~1.5 GB disk space):

cd data
mkdir librispeech-test-clean
mkdir librispeech-test-other
mkdir librispeech-temp
cd librispeech-temp
# test-clean.tar.gz is 346 MB, test-other.tar.gz is 328 MB
wget http://www.openslr.org/resources/12/test-clean.tar.gz
wget http://www.openslr.org/resources/12/test-other.tar.gz
tar -xvf test-clean.tar.gz
tar -xvf test-other.tar.gz
mv LibriSpeech ..
cd  ..
rm -Rf LibriSpeech
rm test-clean.tar.gz
rm test-other.tar.gz
cd  ../src
python format_librispeech_gold_transcriptions.py

License

Some code snippets were taken from external sources:

The rest of the code is made available under the CC BY-NC 4.0 license.

Citation

If you use this code in your publications, please cite this paper (mirror):

@inproceedings{ASRbenchmark2018,
  author = {Franck Dernoncourt, Trung Bui, Walter Chang},
  title = {A Framework for Speech Recognition Benchmarking},
  year = {2018},
  booktitle={Interspeech}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].