Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → JuliusKunze → Speechless

JuliusKunze / Speechless

Licence: mit

Speech-to-text based on wav2letter built for transfer learning

Programming Languages

139335 projects - #7 most used programming language

1442 projects

Labels

tensorflow keras speech-recognition

Projects that are alternatives of or similar to Speechless

the open-source virtual assistant for Ubuntu based Linux distributions

Stars: ✭ 1,120 (+1158.43%)

Mutual labels: speech-recognition

Unityandroidspeechrecognition

This repository is a Unity plugin for Android Speech Recognition (based on Java implementation)

Stars: ✭ 73 (-17.98%)

Mutual labels: speech-recognition

开源人工智能，基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互，来宝拥有无限可能。特别说明，来宝运行于Python 3！

Stars: ✭ 81 (-8.99%)

Mutual labels: speech-recognition

Simple speech linguistic AI with Python

Stars: ✭ 66 (-25.84%)

Mutual labels: speech-recognition

Android Speech Recognition

Continuous speech recognition library for Android with options to use GoogleVoiceIme dialog and offline mode.

Stars: ✭ 72 (-19.1%)

Mutual labels: speech-recognition

Speech Recognition model based off of FAIR research paper built using Pytorch.

Stars: ✭ 78 (-12.36%)

Mutual labels: speech-recognition

Audio Pretrained Model

A collection of Audio and Speech pre-trained models.

Stars: ✭ 61 (-31.46%)

Mutual labels: speech-recognition

Open-Source Large Vocabulary Continuous Speech Recognition Engine

Stars: ✭ 1,258 (+1313.48%)

Mutual labels: speech-recognition

Nativescript Speech Recognition

💬 Speech to text, using the awesome engines readily available on the device.

Stars: ✭ 72 (-19.1%)

Mutual labels: speech-recognition

Deepspeech Websocket Server

Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments

Stars: ✭ 79 (-11.24%)

Mutual labels: speech-recognition

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-22.47%)

Mutual labels: speech-recognition

Program to benchmark various speech recognition APIs

Stars: ✭ 71 (-20.22%)

Mutual labels: speech-recognition

a Flutter "speech to todo" app example

Stars: ✭ 79 (-11.24%)

Mutual labels: speech-recognition

A list of paper, books and sites for various different topics related to machine learning and deep learning along with various field in which it is implemented

Stars: ✭ 63 (-29.21%)

Mutual labels: speech-recognition

B.E.N.J.I.- The Impossible Missions Force's digital assistant

Stars: ✭ 83 (-6.74%)

Mutual labels: speech-recognition

⦠ Angle: new speakable syntax for python 💡

Stars: ✭ 61 (-31.46%)

Mutual labels: speech-recognition

This python code performs an efficient speech reverberation starting from a dataset of close-talking speech signals and a collection of acoustic impulse responses.

Stars: ✭ 74 (-16.85%)

Mutual labels: speech-recognition

Speech Emotion Recognition

Detecting emotions using MFCC features of human speech using Deep Learning

Stars: ✭ 89 (+0%)

Mutual labels: speech-recognition

中文语音识别; Mandarin Automatic Speech Recognition;

Stars: ✭ 1,246 (+1300%)

Mutual labels: speech-recognition

A PaddlePaddle implementation of ASR.

Stars: ✭ 1,219 (+1269.66%)

Mutual labels: speech-recognition

View All Similar Projects ➔

speechless

Speech recognizer based on wav2letter architecture built with Keras.

Supports CTC loss, KenLM and greedy decoding and transfer learning between different languages. ASG loss is currently not supported.

Training for English with the 1000h LibriSpeech corpus works out of the box, while training for the German language requires downloading data manually.

Installation

Python 3.4+ and TensorFlow are required.

pip3 install [email protected]:JuliusKunze/speechless.git

will install speechless together with minimal requirements.

If you want to use the KenLM decoder, this modified version of TensorFlow needs to be installed first.

You need to have an audio backend available, for example ffmpeg (run brew install ffmpeg on Mac OS).

Training

from speechless.configuration import Configuration

Configuration.minimal_english().train_from_beginning()

will automatically download a small English example corpus (337MB), train a net based on it while giving you updated loss and predictions. If you use a strong consumer-grade GPU, you should observe training predictions become similar to the input after ~12h, e. g.

Expected:  "just thrust and parry and victory to the stronger"
Predicted: "jest thcrus and pary and bettor o the stronter"
Errors: 10 letters (20%), 6 words (67%), loss: 37.19.

All data (corpus, nets, logs) will be stored in ~/speechless-data.

This directory can be changed:

from pathlib import Path

from speechless import configuration
from speechless.configuration import Configuration, DataDirectories

configuration.default_data_directories = DataDirectories(Path("/your/data/path"))

Configuration.minimal_english().train_from_beginning()

To download and train on the full 1000h LibriSpeech corpus, replace mininal_english with english.

main.py contains various other functions that were executed to train and use models.

If you want completely flexible where data is saved and loaded from, you should not use Configuration at all but instead use the code from net, corpus, german_corpus, english_corpus and recording directly.

Loading

By default, all trained models are stored in the ~/speechless-data/nets directory. You use models from here by downloading them into this folder (keep the subfolder from Google Drive). To load a such a model use load_best_english_model or load_best_german_model e. g.

from speechless.configuration import Configuration

wav2letter = Configuration.german().load_best_german_model()

If the language was originally trained with a different character set (e. g. a corpus of another language), specifying the allowed_characters_for_loaded_model parameter of load_model still allows you to use that model for training, thereby allowing transfer learning.

Recording

You can record your own audio with a microphone and get a prediction for it:

# ... after loading a model, see above

from speechless.recording import record_plot_and_save

label = record_plot_and_save()

print(wav2letter.predict(label))

Three seconds of silence will end the recording and silence will be truncated. By default, this will generate a wav-file and a spectrogram plot in ~/speechless-data/recordings.

Testing

Given that you downloaded the German corpus into the corpus directory, you can evaluate the German model on the test set:

german.test_model_grouped_by_loaded_corpus_name(wav2letter)

Testing will write to the standard output and a log to ~/speechless-data/test-results by default.

Plotting

Plotting labeled audio examples from the corpus like this one here can be done with LabeledExamplePlotter.save_spectrogram.

German & Sections

For some German datasets, it is possible to retrieve which word is said at which point of time, allowing to extract labeled sections, e. g.:

from speechless.configuration import Configuration

german = Configuration.german()
wav2letter = german.load_best_german_model()
example = german.corpus.examples[0]
sections = example.sections()
for section in sections:
    print(wav2letter.test_and_predict(section))

If you need to access the section labels only (e. g. for filtering for particular words), use example.positional_label.labels (which is faster because no audio data needs to be sliced). If no positional info is available, sections and positional_label are None.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 89

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗