All Projects → MyrtleSoftware → deepspeech

MyrtleSoftware / deepspeech

Licence: other
A PyTorch implementation of DeepSpeech and DeepSpeech2.

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to deepspeech

Deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Stars: ✭ 18,680 (+41411.11%)
Mutual labels:  speech-recognition, speech-to-text, deepspeech
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (+82.22%)
Mutual labels:  speech-recognition, speech-to-text, deepspeech
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+18922.22%)
Mutual labels:  speech-recognition, speech-to-text, deepspeech
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (+15.56%)
Mutual labels:  speech-recognition, speech-to-text
scription
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech
Stars: ✭ 46 (+2.22%)
Mutual labels:  speech-to-text, deepspeech
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-68.89%)
Mutual labels:  speech-recognition, speech-to-text
DeepSpeech-API
The code enables users to use Mozilla's Deep Speech model over the Web Browser.
Stars: ✭ 31 (-31.11%)
Mutual labels:  speech-recognition, speech-to-text
Unity live caption
Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!
Stars: ✭ 26 (-42.22%)
Mutual labels:  speech-recognition, speech-to-text
vspeech
📢 Complete V bindings for Mozilla's DeepSpeech TensorFlow based Speech-to-Text library. 📜
Stars: ✭ 38 (-15.56%)
Mutual labels:  speech-to-text, deepspeech
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-53.33%)
Mutual labels:  speech-recognition, speech-to-text
deep avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Stars: ✭ 104 (+131.11%)
Mutual labels:  speech-recognition, speech-to-text
speech-to-text-code-pattern
React app using the Watson Speech to Text service to transform voice audio into written text.
Stars: ✭ 37 (-17.78%)
Mutual labels:  speech-recognition, speech-to-text
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-68.89%)
Mutual labels:  speech-recognition, speech-to-text
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-53.33%)
Mutual labels:  speech-recognition, speech-to-text
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (+226.67%)
Mutual labels:  speech-recognition, speech-to-text
Inimesed
An Android app that lets you search your contacts by voice. Internet not required. Based on Pocketsphinx. Uses Estonian acoustic models.
Stars: ✭ 65 (+44.44%)
Mutual labels:  speech-recognition, speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+1768.89%)
Mutual labels:  speech-recognition, speech-to-text
Deep-learning-And-Paper
【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等
Stars: ✭ 62 (+37.78%)
Mutual labels:  speech-recognition, speech-to-text
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (+33.33%)
Mutual labels:  speech-recognition, speech-to-text
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (+11.11%)
Mutual labels:  speech-recognition, speech-to-text

Myrtle Deep Speech

A PyTorch implementation of DeepSpeech and DeepSpeech2.

This repository is intended as an evolving baseline for other implementations to compare their training performance against.

Current roadmap:

  1. Pre-trained weights for both networks and full performance statistics.
  2. Mixed-precision training.

Running

Build the Docker image:

make build

Run the Docker container (here using nvidia-docker), ensuring to publish the port of the JupyterLab session to the host:

sudo docker run --runtime=nvidia --shm-size 512M -p 9999:9999 deepspeech

The JupyterLab session can be accessed via localhost:9999.

This Python package will accessible in the running Docker container and is accessible through either the command line interface:

deepspeech --help

or as a Python package:

import deepspeech

Examples

deepspeech --help will print the configurable parameters (batch size, learning rate, log location, number of epochs...) - it aims to have reasonably sensible defaults.

Training

A Deep Speech training run can be started by the following command, adding flags as necessary:

deepspeech ds1

By default the experimental data and logs are output to /tmp/experiments/year_month_date-hour_minute_second_microsecond.

Inference

A Deep Speech evaluation run can be started by the following command, adding flags as necessary:

deepspeech ds1 \
           --state_dict_path $MODEL_PATH \
           --log_file \
           --decoder greedy \
           --train_subsets \
           --dev_log wer \
           --dev_subsets dev-clean \
           --dev_batch_size 1

Note the lack of an argument to --log_file causes the WER results to be written to stderr.

Dataset

The package contains code to download and use the LibriSpeech ASR corpus.

WER

The word error rate (WER) is computed using the formula that is widely used in many open-source speech-to-text systems (Kaldi, PaddlePaddle, Mozilla DeepSpeech). In pseudocode, where N is the number of validation or test samples:

sum_edits = sum([edit_distance(target, predict)
                 for target, predict in zip(targets, predictions)])
sum_lens = sum([len(target) for target in targets])
WER = (1.0/N) * (sum_edits / sum_lens)

This reduces the impact on the WER of errors in short sentences. Toy example:

Target Prediction Edit Distance Label Length
lectures lectured 1 1
i'm afraid he said i am afraid he said 2 4
nice to see you mister meeking nice to see your mister makin 2 6

The mean WER of each sample considered individually is:

>>> (1.0/3) * ((1.0/1) + (2.0/4) + (2.0/6))
0.611111111111111

Compared to the pseudocode version given above:

>>> (1.0/3) * ((1.0 + 2 + 2) / (1.0 + 4 + 6))
0.1515151515151515

Maintainer

Please contact sam at myrtle dot ai.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].