MyrtleSoftware / deepspeech

Licence: other

A PyTorch implementation of DeepSpeech and DeepSpeech2.

Programming Languages

python

139335 projects - #7 most used programming language

Dockerfile

14818 projects

Makefile

30231 projects

Projects that are alternatives of or similar to deepspeech

Deepspeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Stars: ✭ 18,680 (+41411.11%)

Mutual labels: speech-recognition, speech-to-text, deepspeech

deepspeech.mxnet

A MXNet implementation of Baidu's DeepSpeech architecture

Stars: ✭ 82 (+82.22%)

Mutual labels: speech-recognition, speech-to-text, deepspeech

leon

🧠 Leon is your open-source personal assistant.

Stars: ✭ 8,560 (+18922.22%)

Mutual labels: speech-recognition, speech-to-text, deepspeech

vosk-asterisk

Speech Recognition in Asterisk with Vosk Server

Stars: ✭ 52 (+15.56%)

Mutual labels: speech-recognition, speech-to-text

scription

An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech

Stars: ✭ 46 (+2.22%)

Mutual labels: speech-to-text, deepspeech

scripty

Speech to text bot for Discord using Mozilla's DeepSpeech

Stars: ✭ 14 (-68.89%)

Mutual labels: speech-recognition, speech-to-text

DeepSpeech-API

The code enables users to use Mozilla's Deep Speech model over the Web Browser.

Stars: ✭ 31 (-31.11%)

Mutual labels: speech-recognition, speech-to-text

Unity live caption

Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!

Stars: ✭ 26 (-42.22%)

Mutual labels: speech-recognition, speech-to-text

vspeech

📢 Complete V bindings for Mozilla's DeepSpeech TensorFlow based Speech-to-Text library. 📜

Stars: ✭ 38 (-15.56%)

Mutual labels: speech-to-text, deepspeech

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-53.33%)

Mutual labels: speech-recognition, speech-to-text

deep avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Stars: ✭ 104 (+131.11%)

Mutual labels: speech-recognition, speech-to-text

speech-to-text-code-pattern

React app using the Watson Speech to Text service to transform voice audio into written text.

Stars: ✭ 37 (-17.78%)

Mutual labels: speech-recognition, speech-to-text

kaldi ag training

Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Stars: ✭ 14 (-68.89%)

Mutual labels: speech-recognition, speech-to-text

PCPM

Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.

Stars: ✭ 21 (-53.33%)

Mutual labels: speech-recognition, speech-to-text

Chinese-automatic-speech-recognition

Chinese speech recognition

Stars: ✭ 147 (+226.67%)

Mutual labels: speech-recognition, speech-to-text

Inimesed

An Android app that lets you search your contacts by voice. Internet not required. Based on Pocketsphinx. Uses Estonian acoustic models.

Stars: ✭ 65 (+44.44%)

Mutual labels: speech-recognition, speech-to-text

open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

Stars: ✭ 841 (+1768.89%)

Mutual labels: speech-recognition, speech-to-text

Deep-learning-And-Paper

【仅作为交流学习使用】机器智能--相关书目及经典论文包括AutoML、情感分类、语音识别、声纹识别、语音合成实验代码等

Stars: ✭ 62 (+37.78%)

Mutual labels: speech-recognition, speech-to-text

rnnt decoder cuda

An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

Stars: ✭ 60 (+33.33%)

Mutual labels: speech-recognition, speech-to-text

AmazonSpeechTranslator

End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.

Stars: ✭ 50 (+11.11%)

Mutual labels: speech-recognition, speech-to-text

View All Similar Projects ➔

Myrtle Deep Speech

A PyTorch implementation of DeepSpeech and DeepSpeech2.

This repository is intended as an evolving baseline for other implementations to compare their training performance against.

Current roadmap:

~~Pre-trained weights for both networks and full performance statistics.~~
- See v0.1 release: https://github.com/MyrtleSoftware/deepspeech/releases/tag/v0.1
Mixed-precision training.

Running

Build the Docker image:

make build

Run the Docker container (here using nvidia-docker), ensuring to publish the port of the JupyterLab session to the host:

sudo docker run --runtime=nvidia --shm-size 512M -p 9999:9999 deepspeech

The JupyterLab session can be accessed via localhost:9999.

This Python package will accessible in the running Docker container and is accessible through either the command line interface:

deepspeech --help

or as a Python package:

import deepspeech

Examples

deepspeech --help will print the configurable parameters (batch size, learning rate, log location, number of epochs...) - it aims to have reasonably sensible defaults.

Training

A Deep Speech training run can be started by the following command, adding flags as necessary:

deepspeech ds1

By default the experimental data and logs are output to /tmp/experiments/year_month_date-hour_minute_second_microsecond.

Inference

A Deep Speech evaluation run can be started by the following command, adding flags as necessary:

deepspeech ds1 \
           --state_dict_path $MODEL_PATH \
           --log_file \
           --decoder greedy \
           --train_subsets \
           --dev_log wer \
           --dev_subsets dev-clean \
           --dev_batch_size 1

Note the lack of an argument to --log_file causes the WER results to be written to stderr.

Dataset

The package contains code to download and use the LibriSpeech ASR corpus.

WER

The word error rate (WER) is computed using the formula that is widely used in many open-source speech-to-text systems (Kaldi, PaddlePaddle, Mozilla DeepSpeech). In pseudocode, where N is the number of validation or test samples:

sum_edits = sum([edit_distance(target, predict)
                 for target, predict in zip(targets, predictions)])
sum_lens = sum([len(target) for target in targets])
WER = (1.0/N) * (sum_edits / sum_lens)

This reduces the impact on the WER of errors in short sentences. Toy example:

Target	Prediction	Edit Distance	Label Length
lectures	lectured	1	1
i'm afraid he said	i am afraid he said	2	4
nice to see you mister meeking	nice to see your mister makin	2	6

The mean WER of each sample considered individually is:

>>> (1.0/3) * ((1.0/1) + (2.0/4) + (2.0/6))
0.611111111111111

Compared to the pseudocode version given above:

>>> (1.0/3) * ((1.0 + 2 + 2) / (1.0 + 4 + 6))
0.1515151515151515

Maintainer

Please contact sam at myrtle dot ai.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

MyrtleSoftware / deepspeech

Programming Languages

Labels

Projects that are alternatives of or similar to deepspeech

Myrtle Deep Speech

Running

Examples

Training

Inference

Dataset

WER

Maintainer