All Projects → juliuskunze → speechless

juliuskunze / speechless

Licence: MIT license
Speech-to-text based on wav2letter built for transfer learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to speechless

PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-77.17%)
Mutual labels:  speech-recognition
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-84.78%)
Mutual labels:  speech-recognition
syn-speech-samples
An application that demostrate the usage of Syn.Speech library for Speech Recognition
Stars: ✭ 24 (-73.91%)
Mutual labels:  speech-recognition
VoiceBridge
VoiceBridge - an AI-TOOLKIT Open Source C++ Speech Recognition Toolkit
Stars: ✭ 17 (-81.52%)
Mutual labels:  speech-recognition
timit-preprocessor
Extract mfcc vectors and phones from TIMIT dataset
Stars: ✭ 14 (-84.78%)
Mutual labels:  speech-recognition
Chinese-automatic-speech-recognition
Chinese speech recognition
Stars: ✭ 147 (+59.78%)
Mutual labels:  speech-recognition
ml-with-audio
HF's ML for Audio study group
Stars: ✭ 104 (+13.04%)
Mutual labels:  speech-recognition
mongolian-nlp
Useful resources for Mongolian NLP
Stars: ✭ 119 (+29.35%)
Mutual labels:  speech-recognition
Transformer-Transducer
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
Stars: ✭ 61 (-33.7%)
Mutual labels:  speech-recognition
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+2491.3%)
Mutual labels:  speech-recognition
QuantumSpeech-QCNN
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition
Stars: ✭ 71 (-22.83%)
Mutual labels:  speech-recognition
speech-recognition-transfer-learning
Speech command recognition DenseNet transfer learning from UrbanSound8k in keras tensorflow
Stars: ✭ 18 (-80.43%)
Mutual labels:  speech-recognition
favorite-research-papers
Listing my favorite research papers 📝 from different fields as I read them.
Stars: ✭ 12 (-86.96%)
Mutual labels:  speech-recognition
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-84.78%)
Mutual labels:  speech-recognition
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+814.13%)
Mutual labels:  speech-recognition
Inimesed
An Android app that lets you search your contacts by voice. Internet not required. Based on Pocketsphinx. Uses Estonian acoustic models.
Stars: ✭ 65 (-29.35%)
Mutual labels:  speech-recognition
VoiceDictation
迅飞 语音听写 WebAPI - 把语音(≤60秒)转换成对应的文字信息,让机器能够“听懂”人类语言,相当于给机器安装上“耳朵”,使其具备“能听”的功能。
Stars: ✭ 36 (-60.87%)
Mutual labels:  speech-recognition
Unity live caption
Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!
Stars: ✭ 26 (-71.74%)
Mutual labels:  speech-recognition
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (+122.83%)
Mutual labels:  speech-recognition
pytorch audio
audio processing module for pytorch:stft, istft
Stars: ✭ 33 (-64.13%)
Mutual labels:  speech-recognition

speechless

Speech recognizer based on wav2letter architecture built with Keras.

Supports CTC loss, KenLM and greedy decoding and transfer learning between different languages. ASG loss is currently not supported.

Training for English with the 1000h LibriSpeech corpus works out of the box, while training for the German language requires downloading data manually.

Installation

Python 3.4+ and TensorFlow are required.

pip3 install [email protected]:JuliusKunze/speechless.git

will install speechless together with minimal requirements.

If you want to use the KenLM decoder, this modified version of TensorFlow needs to be installed first.

You need to have an audio backend available, for example ffmpeg (run brew install ffmpeg on Mac OS).

Training

from speechless.configuration import Configuration

Configuration.minimal_english().train_from_beginning()

will automatically download a small English example corpus (337MB), train a net based on it while giving you updated loss and predictions. If you use a strong consumer-grade GPU, you should observe training predictions become similar to the input after ~12h, e. g.

Expected:  "just thrust and parry and victory to the stronger"
Predicted: "jest thcrus and pary and bettor o the stronter"
Errors: 10 letters (20%), 6 words (67%), loss: 37.19.

All data (corpus, nets, logs) will be stored in ~/speechless-data.

This directory can be changed:

from pathlib import Path

from speechless import configuration
from speechless.configuration import Configuration, DataDirectories

configuration.default_data_directories = DataDirectories(Path("/your/data/path"))

Configuration.minimal_english().train_from_beginning()

To download and train on the full 1000h LibriSpeech corpus, replace mininal_english with english.

main.py contains various other functions that were executed to train and use models.

If you want completely flexible where data is saved and loaded from, you should not use Configuration at all but instead use the code from net, corpus, german_corpus, english_corpus and recording directly.

Loading

By default, all trained models are stored in the ~/speechless-data/nets directory. You use models from here by downloading them into this folder (keep the subfolder from Google Drive). To load a such a model use load_best_english_model or load_best_german_model e. g.

from speechless.configuration import Configuration

wav2letter = Configuration.german().load_best_german_model()

If the language was originally trained with a different character set (e. g. a corpus of another language), specifying the allowed_characters_for_loaded_model parameter of load_model still allows you to use that model for training, thereby allowing transfer learning.

Recording

You can record your own audio with a microphone and get a prediction for it:

# ... after loading a model, see above

from speechless.recording import record_plot_and_save

label = record_plot_and_save()

print(wav2letter.predict(label))

Three seconds of silence will end the recording and silence will be truncated. By default, this will generate a wav-file and a spectrogram plot in ~/speechless-data/recordings.

Testing

Given that you downloaded the German corpus into the corpus directory, you can evaluate the German model on the test set:

german.test_model_grouped_by_loaded_corpus_name(wav2letter)

Testing will write to the standard output and a log to ~/speechless-data/test-results by default.

Plotting

Plotting labeled audio examples from the corpus like this one here can be done with LabeledExamplePlotter.save_spectrogram.

German & Sections

For some German datasets, it is possible to retrieve which word is said at which point of time, allowing to extract labeled sections, e. g.:

from speechless.configuration import Configuration

german = Configuration.german()
wav2letter = german.load_best_german_model()
example = german.corpus.examples[0]
sections = example.sections()
for section in sections:
    print(wav2letter.test_and_predict(section))

If you need to access the section labels only (e. g. for filtering for particular words), use example.positional_label.labels (which is faster because no audio data needs to be sliced). If no positional info is available, sections and positional_label are None.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].