All Projects → daanzu → Deepspeech Websocket Server

daanzu / Deepspeech Websocket Server

Licence: mpl-2.0
Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Deepspeech Websocket Server

Kur
Descriptive Deep Learning
Stars: ✭ 811 (+926.58%)
Mutual labels:  speech-recognition, speech-to-text
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-22.78%)
Mutual labels:  speech-recognition, speech-to-text
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (-55.7%)
Mutual labels:  speech-recognition, speech-to-text
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+7768.35%)
Mutual labels:  speech-recognition, speech-to-text
Wav2letter
Speech Recognition model based off of FAIR research paper built using Pytorch.
Stars: ✭ 78 (-1.27%)
Mutual labels:  speech-recognition, speech-to-text
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+834.18%)
Mutual labels:  speech-recognition, speech-to-text
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-27.85%)
Mutual labels:  speech-recognition, speech-to-text
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+560.76%)
Mutual labels:  speech-recognition, speech-to-text
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-12.66%)
Mutual labels:  speech-recognition, speech-to-text
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+1317.72%)
Mutual labels:  speech-recognition, speech-to-text
Adapt
Adapt Intent Parser
Stars: ✭ 690 (+773.42%)
Mutual labels:  speech-recognition, speech-to-text
Nativescript Speech Recognition
💬 Speech to text, using the awesome engines readily available on the device.
Stars: ✭ 72 (-8.86%)
Mutual labels:  speech-recognition, speech-to-text
Speech recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Stars: ✭ 5,999 (+7493.67%)
Mutual labels:  speech-recognition, speech-to-text
Stephanie Va
Stephanie is an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work.
Stars: ✭ 772 (+877.22%)
Mutual labels:  speech-recognition, speech-to-text
Sonus
💬 /so.nus/ STT (speech to text) for Node with offline hotword detection
Stars: ✭ 532 (+573.42%)
Mutual labels:  speech-recognition, speech-to-text
Artyom.js
A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.
Stars: ✭ 1,011 (+1179.75%)
Mutual labels:  speech-recognition, speech-to-text
Speech To Text Benchmark
speech to text benchmark framework
Stars: ✭ 481 (+508.86%)
Mutual labels:  speech-recognition, speech-to-text
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (+520.25%)
Mutual labels:  speech-recognition, speech-to-text
Angle
⦠ Angle: new speakable syntax for python 💡
Stars: ✭ 61 (-22.78%)
Mutual labels:  speech-recognition, speech-to-text
Patter
speech-to-text in pytorch
Stars: ✭ 71 (-10.13%)
Mutual labels:  speech-recognition, speech-to-text

DeepSpeech WebSocket Server

Donate Donate Donate Donate [GitHub is currently matching all my donations $-for-$.]

This is a WebSocket server (& client) for Mozilla's DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.

Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.

Features

  • Server
    • Tested and works with DeepSpeech v0.7 (thanks @Kai-Karren)
    • Streaming inference via DeepSpeech v0.2+
    • Streams raw audio data from client via WebSocket
    • Multi-user (only decodes one stream at a time, but can block until decoding is available)
  • Client
    • Streams raw audio data from microphone to server via WebSocket
    • Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances
    • Hypnotizing spinner to indicate voice activity is detected!
    • Option to automatically save each utterance to a separate .wav file, for later testing
    • Need to pause/unpause listening? See here.

Installation

This package is developed in Python 3. Activate a virtualenv, then install the requirements for the server and/or client, depending on usage:

pip install -r requirements-server.txt
### AND/OR ###
pip install -r requirements-client.txt

To run the server in an environment, you also need to install DeepSpeech, which requires choosing either the CPU xor GPU version:

pip install deepspeech
### XOR ###
pip install deepspeech-gpu

Upgrade to the latest DeepSpeech with pip install deepspeech --upgrade (or gpu version). This package works with v0.3.0.

The client uses pyaudio and portaudio for microphone access. In my experience, this works out of the box on Windows. On Linux, you may need to install portaudio header files to compile the pyaudio package: sudo apt install portaudio19-dev . On MacOS, try installing portaudio with brew: brew install portaudio .

Server

> python server.py --model ../models/daanzu-6h-512l-0001lr-425dr/ -l -t
Initializing model...
2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.model: ../models/daanzu-6h-512l-0001lr-425dr/output_graph.pb
2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.alphabet: ../models/daanzu-6h-512l-0001lr-425dr/alphabet.txt
TensorFlow: v1.6.0-18-g5021473
DeepSpeech: v0.2.0-0-g009f9b6
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-10-06 05:55:16.358385: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.lm: ../models/daanzu-6h-512l-0001lr-425dr/lm.binary
2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.trie: ../models/daanzu-6h-512l-0001lr-425dr/trie
Bottle v0.12.13 server starting up (using GeventWebSocketServer())...
Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

2018-10-06 AM 05:55:30.194: __main__: INFO: echo(): recognized: 'alpha bravo charlie'
2018-10-06 AM 05:55:32.297: __main__: INFO: echo(): recognized: 'delta echo foxtrot'
2018-10-06 AM 05:55:54.747: __main__: INFO: echo(): dead websocket
^CKeyboardInterrupt
> python server.py -h
usage: server.py [-h] -m MODEL [-a [ALPHABET]] [-l [LM]] [-t [TRIE]] [--lw LW]
                 [--vwcw VWCW] [--bw BW] [-p PORT]

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to the model (protocol buffer binary file, or
                        directory containing all files for model)
  -a [ALPHABET], --alphabet [ALPHABET]
                        Path to the configuration file specifying the alphabet
                        used by the network. Default: alphabet.txt
  -l [LM], --lm [LM]    Path to the language model binary file. Default:
                        lm.binary
  -t [TRIE], --trie [TRIE]
                        Path to the language model trie file created with
                        native_client/generate_trie. Default: trie
  --lw LW               The alpha hyperparameter of the CTC decoder. Language
                        Model weight. Default: 1.5
  --vwcw VWCW           Valid word insertion weight. This is used to lessen
                        the word insertion penalty when the inserted word is
                        part of the vocabulary. Default: 2.25
  --bw BW               Beam width used in the CTC decoder when building
                        candidate transcriptions. Default: 1024
  -p PORT, --port PORT  Port to run server on. Default: 8080

Client

λ py client.py
Listening...
Recognized: alpha bravo charlie
Recognized: delta echo foxtrot
^C
λ py client.py -h
usage: client.py [-h] [-s SERVER] [-a AGGRESSIVENESS] [--nospinner]
                 [-w SAVEWAV]

Streams raw audio data from microphone with VAD to server via WebSocket

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        Default: ws://localhost:8080/recognize
  -a AGGRESSIVENESS, --aggressiveness AGGRESSIVENESS
                        Set aggressiveness of VAD: an integer between 0 and 3,
                        0 being the least aggressive about filtering out non-
                        speech, 3 the most aggressive. Default: 3
  --nospinner           Disable spinner
  -w SAVEWAV, --savewav SAVEWAV
                        Save .wav files of utterences to given directory

Contributions

Pull requests welcome.

Contributors:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].