Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → daanzu → Deepspeech Websocket Server

daanzu / Deepspeech Websocket Server

Licence: mpl-2.0

Server & client for DeepSpeech using WebSockets for real-time speech recognition in separate environments

Programming Languages

139335 projects - #7 most used programming language

Labels

websocket speech-recognition speech-to-text

Projects that are alternatives of or similar to Deepspeech Websocket Server

Descriptive Deep Learning

Stars: ✭ 811 (+926.58%)

Mutual labels: speech-recognition, speech-to-text

Audio Pretrained Model

A collection of Audio and Speech pre-trained models.

Stars: ✭ 61 (-22.78%)

Mutual labels: speech-recognition, speech-to-text

Discordspeechbot

A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.

Stars: ✭ 35 (-55.7%)

Mutual labels: speech-recognition, speech-to-text

💬 Speech recognition for your site

Stars: ✭ 6,216 (+7768.35%)

Mutual labels: speech-recognition, speech-to-text

Speech Recognition model based off of FAIR research paper built using Pytorch.

Stars: ✭ 78 (-1.27%)

Mutual labels: speech-recognition, speech-to-text

The official repository of the Eesen project

Stars: ✭ 738 (+834.18%)

Mutual labels: speech-recognition, speech-to-text

Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework

Stars: ✭ 57 (-27.85%)

Mutual labels: speech-recognition, speech-to-text

Silero Models: pre-trained STT models and benchmarks made embarrassingly simple

Stars: ✭ 522 (+560.76%)

Mutual labels: speech-recognition, speech-to-text

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-12.66%)

Mutual labels: speech-recognition, speech-to-text

the open-source virtual assistant for Ubuntu based Linux distributions

Stars: ✭ 1,120 (+1317.72%)

Mutual labels: speech-recognition, speech-to-text

Adapt Intent Parser

Stars: ✭ 690 (+773.42%)

Mutual labels: speech-recognition, speech-to-text

Nativescript Speech Recognition

💬 Speech to text, using the awesome engines readily available on the device.

Stars: ✭ 72 (-8.86%)

Mutual labels: speech-recognition, speech-to-text

Speech recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Stars: ✭ 5,999 (+7493.67%)

Mutual labels: speech-recognition, speech-to-text

Stephanie is an open-source platform built specifically for voice-controlled applications as well as to automate daily tasks imitating much of an virtual assistant's work.

Stars: ✭ 772 (+877.22%)

Mutual labels: speech-recognition, speech-to-text

💬 /so.nus/ STT (speech to text) for Node with offline hotword detection

Stars: ✭ 532 (+573.42%)

Mutual labels: speech-recognition, speech-to-text

A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.

Stars: ✭ 1,011 (+1179.75%)

Mutual labels: speech-recognition, speech-to-text

Speech To Text Benchmark

speech to text benchmark framework

Stars: ✭ 481 (+508.86%)

Mutual labels: speech-recognition, speech-to-text

Java Speech Api

The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.

Stars: ✭ 490 (+520.25%)

Mutual labels: speech-recognition, speech-to-text

⦠ Angle: new speakable syntax for python 💡

Stars: ✭ 61 (-22.78%)

Mutual labels: speech-recognition, speech-to-text

speech-to-text in pytorch

Stars: ✭ 71 (-10.13%)

Mutual labels: speech-recognition, speech-to-text

View All Similar Projects ➔

DeepSpeech WebSocket Server

[GitHub is currently matching all my donations $-for-$.]

This is a WebSocket server (& client) for Mozilla's DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.

Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.

Features

Server
- Tested and works with DeepSpeech v0.7 (thanks @Kai-Karren)
- Streaming inference via DeepSpeech v0.2+
- Streams raw audio data from client via WebSocket
- Multi-user (only decodes one stream at a time, but can block until decoding is available)
Client
- Streams raw audio data from microphone to server via WebSocket
- Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances
- Hypnotizing spinner to indicate voice activity is detected!
- Option to automatically save each utterance to a separate .wav file, for later testing
- Need to pause/unpause listening? See here.

Installation

This package is developed in Python 3. Activate a virtualenv, then install the requirements for the server and/or client, depending on usage:

pip install -r requirements-server.txt
### AND/OR ###
pip install -r requirements-client.txt

To run the server in an environment, you also need to install DeepSpeech, which requires choosing either the CPU xor GPU version:

pip install deepspeech
### XOR ###
pip install deepspeech-gpu

Upgrade to the latest DeepSpeech with pip install deepspeech --upgrade (or gpu version). This package works with v0.3.0.

The client uses pyaudio and portaudio for microphone access. In my experience, this works out of the box on Windows. On Linux, you may need to install portaudio header files to compile the pyaudio package: sudo apt install portaudio19-dev . On MacOS, try installing portaudio with brew: brew install portaudio .

Server

> python server.py --model ../models/daanzu-6h-512l-0001lr-425dr/ -l -t
Initializing model...
2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.model: ../models/daanzu-6h-512l-0001lr-425dr/output_graph.pb
2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.alphabet: ../models/daanzu-6h-512l-0001lr-425dr/alphabet.txt
TensorFlow: v1.6.0-18-g5021473
DeepSpeech: v0.2.0-0-g009f9b6
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2018-10-06 05:55:16.358385: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.lm: ../models/daanzu-6h-512l-0001lr-425dr/lm.binary
2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.trie: ../models/daanzu-6h-512l-0001lr-425dr/trie
Bottle v0.12.13 server starting up (using GeventWebSocketServer())...
Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

2018-10-06 AM 05:55:30.194: __main__: INFO: echo(): recognized: 'alpha bravo charlie'
2018-10-06 AM 05:55:32.297: __main__: INFO: echo(): recognized: 'delta echo foxtrot'
2018-10-06 AM 05:55:54.747: __main__: INFO: echo(): dead websocket
^CKeyboardInterrupt

> python server.py -h
usage: server.py [-h] -m MODEL [-a [ALPHABET]] [-l [LM]] [-t [TRIE]] [--lw LW]
                 [--vwcw VWCW] [--bw BW] [-p PORT]

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Path to the model (protocol buffer binary file, or
                        directory containing all files for model)
  -a [ALPHABET], --alphabet [ALPHABET]
                        Path to the configuration file specifying the alphabet
                        used by the network. Default: alphabet.txt
  -l [LM], --lm [LM]    Path to the language model binary file. Default:
                        lm.binary
  -t [TRIE], --trie [TRIE]
                        Path to the language model trie file created with
                        native_client/generate_trie. Default: trie
  --lw LW               The alpha hyperparameter of the CTC decoder. Language
                        Model weight. Default: 1.5
  --vwcw VWCW           Valid word insertion weight. This is used to lessen
                        the word insertion penalty when the inserted word is
                        part of the vocabulary. Default: 2.25
  --bw BW               Beam width used in the CTC decoder when building
                        candidate transcriptions. Default: 1024
  -p PORT, --port PORT  Port to run server on. Default: 8080

Client

λ py client.py
Listening...
Recognized: alpha bravo charlie
Recognized: delta echo foxtrot
^C

λ py client.py -h
usage: client.py [-h] [-s SERVER] [-a AGGRESSIVENESS] [--nospinner]
                 [-w SAVEWAV]

Streams raw audio data from microphone with VAD to server via WebSocket

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        Default: ws://localhost:8080/recognize
  -a AGGRESSIVENESS, --aggressiveness AGGRESSIVENESS
                        Set aggressiveness of VAD: an integer between 0 and 3,
                        0 being the least aggressive about filtering out non-
                        speech, 3 the most aggressive. Default: 3
  --nospinner           Disable spinner
  -w SAVEWAV, --savewav SAVEWAV
                        Save .wav files of utterences to given directory

Contributions

Pull requests welcome.

Contributors:

@Zeddy913

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗