All Projects → symblai → speech-recognition-evaluation

symblai / speech-recognition-evaluation

Licence: Apache-2.0 license
Evaluate results from ASR/Speech-to-Text quickly

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to speech-recognition-evaluation

leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+1316%)
Mutual labels:  speech-recognition, speech-to-text, stt, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+392%)
Mutual labels:  speech-recognition, speech-to-text, stt, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-12%)
Mutual labels:  speech-recognition, speech-to-text, stt, asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-16%)
Mutual labels:  speech-recognition, speech-to-text, asr
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (+288%)
Mutual labels:  speech-recognition, speech-to-text, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+5328%)
Mutual labels:  speech-recognition, speech-to-text, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+2852%)
Mutual labels:  speech-recognition, speech-to-text, asr
Tensorflow Speech Recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Stars: ✭ 2,118 (+8372%)
Mutual labels:  speech-recognition, speech-to-text, stt
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (+504%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+720%)
Mutual labels:  speech-recognition, speech-to-text, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+112%)
Mutual labels:  speech-recognition, speech-to-text, asr
Wav2letter
Speech Recognition model based off of FAIR research paper built using Pytorch.
Stars: ✭ 78 (+212%)
Mutual labels:  speech-recognition, speech-to-text, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (+176%)
Mutual labels:  speech-recognition, speech-to-text, asr
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+412%)
Mutual labels:  speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (+128%)
Mutual labels:  speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+720%)
Mutual labels:  speech-recognition, speech-to-text, asr
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+256%)
Mutual labels:  speech-recognition, speech-to-text, stt
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-16%)
Mutual labels:  speech-recognition, stt, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+1432%)
Mutual labels:  speech-recognition, speech-to-text, asr
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+1988%)
Mutual labels:  speech-recognition, speech-to-text, asr

Automatic Speech Recognition (ASR) Evaluation

If you're using any Speech-to-Text or Speech Recognition system to generate transcriptions from your audio/video content, then you can use this tool to compare how well it is doing against a human generated transcription. If you're not sure how to generate transcription, you can take a look here for list of tutorials to help you get started.

What can this utility do?

This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.

This utility can calculate following metrics -

  • Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
  • Word Information Loss (WIL), which is a simple approximation to the proportion of word information lost. Refer to this paper for more info.
  • Levenshtein Distance calculated at word level.
  • Number of Word level insertions, deletions and mismatches between the original file and the generated file.
  • Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
  • Color Highlighted text Comparison to visualize the differences.
  • General Statistics about the original and generated files (bytes, characters, words, new lines etc.)

The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -

  • Remove Speaker Name: Remove the Speaker name at the beginning of the line.
  • Remove Annotations: Remove any custom annotations added during transcriptions.
  • Remove Whitespaces: Remove any extra white spaces.
  • Remove Quotes: Remove any double quotes
  • Remove Dashes: Remove any dashes
  • Remove Punctuations: Remove any punctuations (.,?!)
  • Convert contents to lower case

Pre-requisites

Make sure that you have NodeJS v8+ installed on your system.

Installation

npm install -g speech-recognition-evaluation

Verify installation by simply running:

asr-eval

Usage

Simplest way to run your first evaluation is by simply passing original and generated options to asr-eval command. Where, original is a plain text file containing original transcript to be used as reference; usually this is generated by human beings. And generated is a plain text file containing generated transcript by the STT/ASR system.

asr-eval --original ./original-file.txt --generated ./generated-file.txt

This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:

Word Error Rate (WER): 13.61350109561817%

To find more information about all the available options:

asr-eval --help

All the available usage options would be printed:

Synopsis

  $ asr-eval --original file --generated file           
  $ asr-eval [options] --original file --generated file 
  $ asr-eval --help                                     

Options

  -o, --original file                 Original File to be used as reference. Usually, this should be the            
                                      transcribed file by a Human being.                                            
  -g, --generated file                File with the output generated by Speech Recognition System.                  
  -e, --wer [true|false]              Default: true. Print Word Error Rate (WER).                                   
  -i, --wil [true|false]              Default: true. Print Word Information Loss (WIL).                             
  --distance [true|false]             Default: false. Print total word distance after comparison.                   
  --stats [true|false]                Default: false. Print statistics about original and generate files, before    
                                      and after pre-processing. Also prints statistics about word level and phrase  
                                      level differences.                                                            
  --pairs [true|false]                Default: false. Print all the difference pairs with type of difference.       
  --textcomparison [true|false]       Default: false. Print the text comparison between two files with              
                                      highlighting.                                                                 
  --removespeakers [true|false]       Default: true. Remove the speaker at the start of each line in files before   
                                      calculations. The speaker should be separated by colon ":" i.e. speaker_name: 
                                      text For e.g. "John Doe: Hello, I am John." would get converted to simply     
                                      "Hello, I am John."                                                           
  --removeannotations [true|false]    Default: true. Remove any custom annotations in the transcript before         
                                      calculations. This is useful when removing custom annotations done by human   
                                      transcribers.  Anything in square brackets [] are detected as annotations.    
                                      For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get    
                                      converted to "Hello, I am because of few reasons."                            
  --removewhitespaces [true|false]    Default: true. Remove any extra white spaces before calculations.             
  --removequotes [true|false]         Default: true. Remove any double quotes '"' from the files before             
                                      calculations.                                                                 
  --removedashes [true|false]         Default: true. Remove any dashes (hyphens) "-" from the files before          
                                      calculations.                                                                 
  --removepunctuations [true|false]   Default: true. Remove any punctuations ".,?!" from the files before           
                                      calculations.                                                                 
  --lowercase [true|false]            Default: true. Convert both files to lower case before calculations. This is  
                                      useful if evaluation needs to be done in case-insensitive way.                
  --help [true|false]                 Print this usage guide.                                                                                   

Getting help

If you need help installing or using the utility, please give a shout out in our slack channel

If you've instead found a bug or would like new features added, go ahead and open issues or pull requests against this repo!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].