Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → rolczynski → Automatic Speech Recognition

rolczynski / Automatic Speech Recognition

Licence: agpl-3.0

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning tensorflow keras neural-networks speech-recognition language-model speech-to-text tensorflow-models

Projects that are alternatives of or similar to Automatic Speech Recognition

Openseq2seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

Stars: ✭ 1,378 (+617.71%)

Mutual labels: speech-recognition, language-model, speech-to-text

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (+26.04%)

Mutual labels: neural-networks, speech-recognition, speech-to-text

Lingvo

Stars: ✭ 2,361 (+1129.69%)

Mutual labels: speech-recognition, language-model, speech-to-text

Deepspeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Stars: ✭ 18,680 (+9629.17%)

Mutual labels: neural-networks, speech-recognition, speech-to-text

Audio Pretrained Model

A collection of Audio and Speech pre-trained models.

Stars: ✭ 61 (-68.23%)

Mutual labels: speech-recognition, speech-to-text, tensorflow-models

Kur

Descriptive Deep Learning

Stars: ✭ 811 (+322.4%)

Mutual labels: neural-networks, speech-recognition, speech-to-text

PCPM

Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.

Stars: ✭ 21 (-89.06%)

Mutual labels: speech-recognition, speech-to-text, language-model

Wav2letter

Speech Recognition model based off of FAIR research paper built using Pytorch.

Stars: ✭ 78 (-59.37%)

Mutual labels: neural-networks, speech-recognition, speech-to-text

Spokestack Python

Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.

Stars: ✭ 103 (-46.35%)

Mutual labels: neural-networks, speech-recognition, speech-to-text

Vosk

VOSK Speech Recognition Toolkit

Stars: ✭ 182 (-5.21%)

Mutual labels: speech-recognition, speech-to-text

Persephone

A tool for automatic phoneme transcription

Stars: ✭ 130 (-32.29%)

Mutual labels: neural-networks, speech-recognition

Go Astideepspeech

Golang bindings for Mozilla's DeepSpeech speech-to-text library

Stars: ✭ 137 (-28.65%)

Mutual labels: speech-recognition, speech-to-text

Asr audio data links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 128 (-33.33%)

Mutual labels: speech-recognition, speech-to-text

Tensorflow Ctc Speech Recognition

Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).

Stars: ✭ 127 (-33.85%)

Mutual labels: speech-recognition, speech-to-text

Awesome Ai Services

An overview of the AI-as-a-service landscape

Stars: ✭ 133 (-30.73%)

Mutual labels: speech-recognition, speech-to-text

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+5707.81%)

Mutual labels: speech-recognition, speech-to-text

Zzz Retired openstt

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

Stars: ✭ 146 (-23.96%)

Mutual labels: speech-recognition, speech-to-text

Awesome Speech Recognition Speech Synthesis Papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Stars: ✭ 2,085 (+985.94%)

Mutual labels: language-model, speech-recognition

Kalliope

Kalliope is a framework that will help you to create your own personal assistant.

Stars: ✭ 1,509 (+685.94%)

Mutual labels: speech-recognition, speech-to-text

Speechrecognizerbutton

UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.

Stars: ✭ 144 (-25%)

Mutual labels: speech-recognition, speech-to-text

View All Similar Projects ➔

Automatic Speech Recognition

The project aim is to distill the Automatic Speech Recognition research. At the beginning, you can load a ready-to-use pipeline with a pre-trained model. Benefit from the eager TensorFlow 2.0 and freely monitor model weights, activations or gradients.

import automatic_speech_recognition as asr

file = 'to/test/sample.wav'  # sample rate 16 kHz, and 16 bit depth
sample = asr.utils.read_audio(file)
pipeline = asr.load('deepspeech2', lang='en')
pipeline.model.summary()     # TensorFlow model
sentences = pipeline.predict([sample])

We support english (thanks to Open Seq2Seq). The evaluation results of the English benchmark LibriSpeech dev-clean are in the table. To reference, the DeepSpeech (Mozilla) achieves around 7.5% WER, whereas the state-of-the-art (RWTH Aachen University) equals 2.3% WER (recent evaluation results can be found here). Both of them, use the external language model to boost results. By comparison, humans achieve 5.83% WER here (LibriSpeech dev-clean)

Model Name	Decoder	WER-dev
`deepspeech2`	greedy	6.71

Shortly it turns out that you need to adjust pipeline a little bit. Take a look at the CTC Pipeline. The pipeline is responsible for connecting a neural network model with all non-differential transformations (features extraction or prediction decoding). Pipeline components are independent. You can adjust them to your needs e.g. use more sophisticated feature extraction, different data augmentation, or add the language model decoder (static n-grams or huge transformers). You can do much more like distribute the training using the Strategy, or experiment with mixed precision policy.

import numpy as np
import tensorflow as tf
import automatic_speech_recognition as asr

dataset = asr.dataset.Audio.from_csv('train.csv', batch_size=32)
dev_dataset = asr.dataset.Audio.from_csv('dev.csv', batch_size=32)
alphabet = asr.text.Alphabet(lang='en')
features_extractor = asr.features.FilterBanks(
    features_num=160,
    winlen=0.02,
    winstep=0.01,
    winfunc=np.hanning
)
model = asr.model.get_deepspeech2(
    input_dim=160,
    output_dim=29,
    rnn_units=800,
    is_mixed_precision=False
)
optimizer = tf.optimizers.Adam(
    lr=1e-4,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-8
)
decoder = asr.decoder.GreedyDecoder()
pipeline = asr.pipeline.CTCPipeline(
    alphabet, features_extractor, model, optimizer, decoder
)
pipeline.fit(dataset, dev_dataset, epochs=25)
pipeline.save('/checkpoint')

test_dataset = asr.dataset.Audio.from_csv('test.csv')
wer, cer = asr.evaluate.calculate_error_rates(pipeline, test_dataset)
print(f'WER: {wer}   CER: {cer}')

Installation

You can use pip:

pip install automatic-speech-recognition

Otherwise clone the code and create a new environment via conda:

git clone https://github.com/rolczynski/Automatic-Speech-Recognition.git
conda env create -f=environment.yml     # or use: environment-gpu.yml
conda activate Automatic-Speech-Recognition

References

The fundamental repositories:

Baidu - DeepSpeech2 - A PaddlePaddle implementation of DeepSpeech2 architecture for ASR
NVIDIA - Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
RWTH Aachen University - The RWTH extensible training framework for universal recurrent neural networks
TensorFlow - The implementation of DeepSpeech2 model
Mozilla - DeepSpeech - A TensorFlow implementation of Baidu's DeepSpeech architecture
Espnet - End-to-End Speech Processing Toolkit
Sean Naren - Speech Recognition using DeepSpeech2

Moreover, you can explore the GitHub using key phrases like ASR, DeepSpeech, or Speech-To-Text. The list wer_are_we, an attempt at tracking states of the art, can be helpful too.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 192

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗