All Projects → thecodrr → vspeech

thecodrr / vspeech

Licence: MIT license
📢 Complete V bindings for Mozilla's DeepSpeech TensorFlow based Speech-to-Text library. 📜

Programming Languages

V
68 projects
AMPL
153 projects

Projects that are alternatives of or similar to vspeech

deepspeech
A PyTorch implementation of DeepSpeech and DeepSpeech2.
Stars: ✭ 45 (+18.42%)
Mutual labels:  speech-to-text, deepspeech
deepspeech.mxnet
A MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (+115.79%)
Mutual labels:  speech-to-text, deepspeech
mozilla-deepspeech-flutter
Mozilla DeepSpeech in flutter using Dart FFI
Stars: ✭ 23 (-39.47%)
Mutual labels:  mozilla, deepspeech
scription
An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech
Stars: ✭ 46 (+21.05%)
Mutual labels:  speech-to-text, deepspeech
leon
🧠 Leon is your open-source personal assistant.
Stars: ✭ 8,560 (+22426.32%)
Mutual labels:  speech-to-text, deepspeech
Deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Stars: ✭ 18,680 (+49057.89%)
Mutual labels:  speech-to-text, deepspeech
vave
🌊 A crazy simple library for reading/writing WAV files in V. Zero dependencies, 100% cross-platform.
Stars: ✭ 35 (-7.89%)
Mutual labels:  v, deepspeech
Queries
SQLite queries
Stars: ✭ 57 (+50%)
Mutual labels:  mozilla
Inimesed
An Android app that lets you search your contacts by voice. Internet not required. Based on Pocketsphinx. Uses Estonian acoustic models.
Stars: ✭ 65 (+71.05%)
Mutual labels:  speech-to-text
Mozilla-Italia-l10n-guide
Mozilla Italia localization guide, made by volunteers localizers for volunteer localizers!
Stars: ✭ 14 (-63.16%)
Mutual labels:  mozilla
speechmatics-python
Python library and CLI for Speechmatics
Stars: ✭ 24 (-36.84%)
Mutual labels:  speech-to-text
v-mode
🌻 An Emacs major mode for the V programming language.
Stars: ✭ 49 (+28.95%)
Mutual labels:  v
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-44.74%)
Mutual labels:  speech-to-text
web-speech-cognitive-services
Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.
Stars: ✭ 35 (-7.89%)
Mutual labels:  speech-to-text
scripty
Speech to text bot for Discord using Mozilla's DeepSpeech
Stars: ✭ 14 (-63.16%)
Mutual labels:  speech-to-text
vscode-esdoc-mdn
[BETA] See documentation of any javascript api from mozilla on your visual studio code side by side
Stars: ✭ 31 (-18.42%)
Mutual labels:  mozilla
mozilla-sprint-2018
DEPRECATED & Materials Moved: This sprint was to focus on brainstorming for the Joint Roadmap for Open Science Tools.
Stars: ✭ 24 (-36.84%)
Mutual labels:  mozilla
vargs
Simple argument parsing library for V.
Stars: ✭ 36 (-5.26%)
Mutual labels:  v
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (+31.58%)
Mutual labels:  speech-to-text
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (+57.89%)
Mutual labels:  speech-to-text

📣 vSpeech 📜

V bindings for Mozilla's DeepSpeech TensorFlow based library for Speech-to-Text.

showb3037c75870403f5.gif

Installation:

Install using vpkg

vpkg get https://github.com/thecodrr/vspeech

Install using V's builtin vpm (you will need to import the module with: import thecodrr.vspeech with this method of installation):

v install thecodrr.vspeech

Install using git:

cd path/to/your/project
git clone https://github.com/thecodrr/vspeech

You can use thecodrr.vave for reading WAV files.

Then in the wherever you want to use it:

import thecodrr.vspeech //OR simply vave depending on how you installed
// Optional
import thecodrr.vave

Manual:

Perform the following steps:

  1. Download the latest native_client.<your system>.tar.xz matching your system from DeepSpeech's Releases.

  2. Extract the .tar.xz into your project directory in libs folder. It MUST be in the libs folder. If you don't have one, create it and extract into it.

  3. Download pre-trained model from DeepSpeech's Releases (the file named deepspeech-0.6.1-models.tar.gz). It's pretty big (1.1G) so make sure you have the space.

  4. Extract the model anywhere you like on your system.

  5. Extra: If you don't have any audio files for testing etc. you can download the samples from DeepSpeech's Releases (the file named audio-0.6.1.tar.gz)

  6. When you are done, run this command in your project directory:

    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/lib/
    

And done!

Automatic:

// TODO

I will add a bash script for automating this process including the downloading and extracting etc. PRs welcome.

Usage

There is a complete example of how to use this module in cmd/main.v

import thecodrr.vspeech
// specify values for use later
const (
    beam_width            = 300
    lm_weight             = 0.75
    valid_word_count_weight = 1.85
)
// create a new model
mut model := vspeech.new("/path/to/the/model.pbmm", 1)

lm := "/path/to/the/lm/file" //its in the models archive
trie := "/path/to/the/trie/file" //its in the models archive
// enable the decoder with language model (optional)
model.enable_decoder_with_lm(lm, trie, lm_weight, valid_word_count_weight)

data := byteptr(0)//raw audio samples (use thecodrr.vave module for this)
data_len := 0 //the total length of the buffer
// convert the audio to text
text := model.speech_to_text(data, data_len)
println(text)

// make sure to free everything
unsafe {
    model.free()
    model.free_string(text)
}

API

vspeech.new(model_path, beam_size)

Creates a new Model with the specified model_path and beam_size.

beam_size decides the balance between accuracy and cost. The larger the beam_size the more accurate the decoding will be but at the cost of time and resources.

model_path is the path to the model file. It is the file with .pb extension but it is better to use .pbmm file as it is mmapped and is lighter on the RAM.

Model struct

The main struct represents the interface to the underlying model. It has the following methods:

1. enable_decoder_with_lm(lm_path, trie_path, lm_weight, valid_word_count_weight)

Load the Language Model and enable the decoder to use it. Read the method comments to know what each param does.

2. get_model_sample_rate()

Use this to get the sample rate expected by the model. The audio samples you need converted MUST match this sample rate.

3. speech_to_text(buffer, buffer_size)

This is the method that you are looking for. It's where all the magic happens (and also all the bugs).

buffer is the audio data that needs to be decoded. Currently DeepSpeech supports 16-bit RAW PCM audio stream at the appropriate sample rate. You can use thecodrr.vave to read audio samples from a WAV file.

buffer_size is the total number of bytes in the buffer

4. speech_to_text_with_metadata(buffer, buffer_size)

Same as speech_to_text except this returns a Metadata struct that you can use for output analysis etc.

5. create_stream()

Create a stream for streaming audio data (from a microphone for example) into the decoder. This, however, isn't an actual stream i.e. there's no seek etc. This will initialize the streaming_statein yourModel` instance which you can use as mentioned below.

6. free()

Free the Model

7. free_string(text)

Free the string the decoder outputted in speech_to_text.

StreamingState

The streaming state is used to handle pseudo-streaming of audio content into the decoder. It exposes the following methods:

1. feed_audio_content(buffer, buffer_size)

Use this for feeding multiple chunks of data into the stream continuously.

2. intermediate_decode()

You can use this to get the output of the current data in the stream. However, this is quite expensive due to no streaming capabilities in the decoder. Use this only when necessary.

3. finish_stream()

Call this when streaming is finished and you want the final output of the whole stream.

4. finish_stream_with_metadata()

Same as finish_stream but returns a Metadata struct which you can use to analyze the output.

5. free()

Call this when done to free the captured StreamingState.

Metadata

Fields:

items An array of MetadataItems

num_items Total number of items in the items array.

confidence Approximated confidence value for this transcription

Methods:

get_items() - Converts the C pointer MetadataItem array into V array which you can iterate over normally.

get_text() - Helper method to get the combined text from all the MetadataItems outputting the result in one string.

free() - Free the Metadata instance

MetadataItem

Fields:

character - The character generated for transcription

timestep - Position of the character in units of 20ms

start_time - Position of the character in seconds

Methods:

str() - Combine and output all the data in the MetadataItem nicely into a string.

Find this library useful? ❤️

Support it by joining stargazers for this repository. or buy me a cup of coffee And follow me for my next creations! 🤩

License

MIT License

Copyright (c) 2019 Abdullah Atta

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].