All Projects → MainRo → Deepspeech Server

MainRo / Deepspeech Server

Licence: mpl-2.0
A testing server for a speech to text service based on mozilla deepspeech

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Deepspeech Server

Kalliope
Kalliope is a framework that will help you to create your own personal assistant.
Stars: ✭ 1,509 (+757.39%)
Mutual labels:  speech-recognition, speech-to-text
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (-8.52%)
Mutual labels:  speech-recognition, speech-to-text
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+6235.8%)
Mutual labels:  speech-recognition, speech-to-text
Tacotron asr
Speech Recognition Using Tacotron
Stars: ✭ 165 (-6.25%)
Mutual labels:  speech-recognition, speech-to-text
Go Astideepspeech
Golang bindings for Mozilla's DeepSpeech speech-to-text library
Stars: ✭ 137 (-22.16%)
Mutual labels:  speech-recognition, speech-to-text
Wav2letter.pytorch
A fully convolution-network for speech-to-text, built on pytorch.
Stars: ✭ 104 (-40.91%)
Mutual labels:  speech-recognition, speech-to-text
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-14.2%)
Mutual labels:  speech-recognition, speech-to-text
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+671.02%)
Mutual labels:  speech-recognition, speech-to-text
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-24.43%)
Mutual labels:  speech-recognition, speech-to-text
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (-27.27%)
Mutual labels:  speech-recognition, speech-to-text
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-41.48%)
Mutual labels:  speech-recognition, speech-to-text
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-17.05%)
Mutual labels:  speech-recognition, speech-to-text
Speech And Text
Speech to text (PocketSphinx, Iflytex API, Baidu API) and text to speech (pyttsx3) | 语音转文字(PocketSphinx、百度 API、科大讯飞 API)和文字转语音(pyttsx3)
Stars: ✭ 102 (-42.05%)
Mutual labels:  speech-recognition, speech-to-text
Self Supervised Speech Recognition
speech to text with self-supervised learning based on wav2vec 2.0 framework
Stars: ✭ 106 (-39.77%)
Mutual labels:  speech-recognition, speech-to-text
Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+682.95%)
Mutual labels:  speech-recognition, speech-to-text
Awesome Reactive Programming
A repository for sharing all the resources available on Reactive Programming and Reactive Systems
Stars: ✭ 163 (-7.39%)
Mutual labels:  reactive-extensions, reactivex
B.e.n.j.i.
B.E.N.J.I.- The Impossible Missions Force's digital assistant
Stars: ✭ 83 (-52.84%)
Mutual labels:  speech-recognition, speech-to-text
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (-44.89%)
Mutual labels:  speech-recognition, speech-to-text
Tensorflow Ctc Speech Recognition
Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).
Stars: ✭ 127 (-27.84%)
Mutual labels:  speech-recognition, speech-to-text
Speechrecognizerbutton
UIButton subclass with push to talk recording, speech recognition and Siri-style waveform view.
Stars: ✭ 144 (-18.18%)
Mutual labels:  speech-recognition, speech-to-text

================== DeepSpeech Server

.. image:: https://travis-ci.org/MainRo/deepspeech-server.svg?branch=master :target: https://travis-ci.org/MainRo/deepspeech-server

.. image:: https://badge.fury.io/py/deepspeech-server.svg :target: https://badge.fury.io/py/deepspeech-server

Key Features

This is an http server that can be used to test the Mozilla DeepSpeech project. You need an environment with DeepSpeech and a model to run this server.

This code uses the DeepSpeech 0.7 APIs.

Installation

You first need to install deepspeech. Depending on your system you can use the CPU package:

.. code-block:: console

pip3 install deepspeech

Or the GPU package:

.. code-block:: console

pip3 install deepspeech-gpu

Then you can install the deepspeech server:

.. code-block:: console

python3 setup.py install

The server is also available on pypi, so you can install it with pip:

.. code-block:: console

pip3 install deepspeech-server

Note that python 3.5 is the minimum version required to run the server.

Starting the server

.. code-block:: console

deepspeech-server --config config.json

You can use deepspeech without training a model yourself. Pre-trained models are provided by Mozilla in the release page of the project (See the assets section of the release note):

https://github.com/mozilla/DeepSpeech/releases

Once your downloaded a pre-trained model, you can untar it and directly use the sample configuration file:

.. code-block:: console

cp config.sample.json config.json
deepspeech-server --config config.json

Server configuration

The configuration is done with a json file, provided with the "--config" argument. Its structure is the following one:

.. code-block:: json

{
  "deepspeech": {
    "model" :"deepspeech-0.7.1-models.pbmm",
    "scorer" :"deepspeech-0.7.1-models.scorer",
    "beam_width": 500,
    "lm_alpha": 0.931289039105002,
    "lm_beta": 1.1834137581510284
  },
  "server": {
    "http": {
      "host": "0.0.0.0",
      "port": 8080,
      "request_max_size": 1048576
    }
  },
  "log": {
    "level": [
      { "logger": "deepspeech_server", "level": "DEBUG"}
    ]
  }
}

The configuration file contains several sections and sub-sections.

deepspeech section configuration

Section "deepspeech" contains configuration of the deepspeech engine:

model: The model that was generated by deepspeech. Can be a protobuf file or a memory mapped protobuf.

scorer: [Optional] The scorer file. The scorer is necessary to set lm_alpha or lm_beta manually

beam_width: [Optional] The size of the beam search

lm_alpha and lm_beta: [Optional] The hyperparmeters of the scorer

Section "server" contains configuration of the access part, with on subsection per protocol:

http section configuration

request_max_size (default value: 1048576, i.e. 1MiB) is the maximum payload size allowed by the server. A received payload size above this threshold will return a "413: Request Entity Too Large" error.

host (default value: "0.0.0.0") is the listen address of the http server.

port (default value: 8080) is the listening port of the http server.

log section configuration

The log section can be used to set the log levels of the server. This section contains a list of log entries. Each log entry contains the name of a logger and its level. Both follow the convention of the python logging module.

Using the server

Inference on the model is done via http post requests. For example with the following curl command:

.. code-block:: console

 curl -X POST --data-binary @testfile.wav http://localhost:8080/stt
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].