All Projects → JuliusKunze → Speechless

JuliusKunze / Speechless

Licence: mit
Speech-to-text based on wav2letter built for transfer learning

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Speechless

Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+1158.43%)
Mutual labels:  speech-recognition
Unityandroidspeechrecognition
This repository is a Unity plugin for Android Speech Recognition (based on Java implementation)
Stars: ✭ 73 (-17.98%)
Mutual labels:  speech-recognition
Laibot Client
开源人工智能,基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互,来宝拥有无限可能。特别说明,来宝运行于Python 3!
Stars: ✭ 81 (-8.99%)
Mutual labels:  speech-recognition
Speech ai
Simple speech linguistic AI with Python
Stars: ✭ 66 (-25.84%)
Mutual labels:  speech-recognition
Android Speech Recognition
Continuous speech recognition library for Android with options to use GoogleVoiceIme dialog and offline mode.
Stars: ✭ 72 (-19.1%)
Mutual labels:  speech-recognition
Wav2letter
Speech Recognition model based off of FAIR research paper built using Pytorch.
Stars: ✭ 78 (-12.36%)
Mutual labels:  speech-recognition
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-31.46%)
Mutual labels:  speech-recognition
Julius
Open-Source Large Vocabulary Continuous Speech Recognition Engine
Stars: ✭ 1,258 (+1313.48%)
Mutual labels:  speech-recognition
Nativescript Speech Recognition
💬 Speech to text, using the awesome engines readily available on the device.
Stars: ✭ 72 (-19.1%)
Mutual labels:  speech-recognition
Deepspeech Websocket Server
Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments
Stars: ✭ 79 (-11.24%)
Mutual labels:  speech-recognition
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-22.47%)
Mutual labels:  speech-recognition
Asr benchmark
Program to benchmark various speech recognition APIs
Stars: ✭ 71 (-20.22%)
Mutual labels:  speech-recognition
Sytody
a Flutter "speech to todo" app example
Stars: ✭ 79 (-11.24%)
Mutual labels:  speech-recognition
Papers
A list of paper, books and sites for various different topics related to machine learning and deep learning along with various field in which it is implemented
Stars: ✭ 63 (-29.21%)
Mutual labels:  speech-recognition
B.e.n.j.i.
B.E.N.J.I.- The Impossible Missions Force's digital assistant
Stars: ✭ 83 (-6.74%)
Mutual labels:  speech-recognition
Angle
⦠ Angle: new speakable syntax for python 💡
Stars: ✭ 61 (-31.46%)
Mutual labels:  speech-recognition
Pyspeechrev
This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.
Stars: ✭ 74 (-16.85%)
Mutual labels:  speech-recognition
Speech Emotion Recognition
Detecting emotions using MFCC features of human speech using Deep Learning
Stars: ✭ 89 (+0%)
Mutual labels:  speech-recognition
Masr
中文语音识别; Mandarin Automatic Speech Recognition;
Stars: ✭ 1,246 (+1300%)
Mutual labels:  speech-recognition
Deepspeech
A PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+1269.66%)
Mutual labels:  speech-recognition

speechless

Speech recognizer based on wav2letter architecture built with Keras.

Supports CTC loss, KenLM and greedy decoding and transfer learning between different languages. ASG loss is currently not supported.

Training for English with the 1000h LibriSpeech corpus works out of the box, while training for the German language requires downloading data manually.

Installation

Python 3.4+ and TensorFlow are required.

pip3 install [email protected]:JuliusKunze/speechless.git

will install speechless together with minimal requirements.

If you want to use the KenLM decoder, this modified version of TensorFlow needs to be installed first.

You need to have an audio backend available, for example ffmpeg (run brew install ffmpeg on Mac OS).

Training

from speechless.configuration import Configuration

Configuration.minimal_english().train_from_beginning()

will automatically download a small English example corpus (337MB), train a net based on it while giving you updated loss and predictions. If you use a strong consumer-grade GPU, you should observe training predictions become similar to the input after ~12h, e. g.

Expected:  "just thrust and parry and victory to the stronger"
Predicted: "jest thcrus and pary and bettor o the stronter"
Errors: 10 letters (20%), 6 words (67%), loss: 37.19.

All data (corpus, nets, logs) will be stored in ~/speechless-data.

This directory can be changed:

from pathlib import Path

from speechless import configuration
from speechless.configuration import Configuration, DataDirectories

configuration.default_data_directories = DataDirectories(Path("/your/data/path"))

Configuration.minimal_english().train_from_beginning()

To download and train on the full 1000h LibriSpeech corpus, replace mininal_english with english.

main.py contains various other functions that were executed to train and use models.

If you want completely flexible where data is saved and loaded from, you should not use Configuration at all but instead use the code from net, corpus, german_corpus, english_corpus and recording directly.

Loading

By default, all trained models are stored in the ~/speechless-data/nets directory. You use models from here by downloading them into this folder (keep the subfolder from Google Drive). To load a such a model use load_best_english_model or load_best_german_model e. g.

from speechless.configuration import Configuration

wav2letter = Configuration.german().load_best_german_model()

If the language was originally trained with a different character set (e. g. a corpus of another language), specifying the allowed_characters_for_loaded_model parameter of load_model still allows you to use that model for training, thereby allowing transfer learning.

Recording

You can record your own audio with a microphone and get a prediction for it:

# ... after loading a model, see above

from speechless.recording import record_plot_and_save

label = record_plot_and_save()

print(wav2letter.predict(label))

Three seconds of silence will end the recording and silence will be truncated. By default, this will generate a wav-file and a spectrogram plot in ~/speechless-data/recordings.

Testing

Given that you downloaded the German corpus into the corpus directory, you can evaluate the German model on the test set:

german.test_model_grouped_by_loaded_corpus_name(wav2letter)

Testing will write to the standard output and a log to ~/speechless-data/test-results by default.

Plotting

Plotting labeled audio examples from the corpus like this one here can be done with LabeledExamplePlotter.save_spectrogram.

German & Sections

For some German datasets, it is possible to retrieve which word is said at which point of time, allowing to extract labeled sections, e. g.:

from speechless.configuration import Configuration

german = Configuration.german()
wav2letter = german.load_best_german_model()
example = german.corpus.examples[0]
sections = example.sections()
for section in sections:
    print(wav2letter.test_and_predict(section))

If you need to access the section labels only (e. g. for filtering for particular words), use example.positional_label.labels (which is faster because no audio data needs to be sliced). If no positional info is available, sections and positional_label are None.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].