All Projects → Azure-Samples → Speechtotext Websockets Javascript

Azure-Samples / Speechtotext Websockets Javascript

Licence: mit
SDK & Sample to do speech recognition using websockets in Javascript

Programming Languages

javascript
184084 projects - #8 most used programming language
typescript
32286 projects
js
455 projects
ts
41 projects

Projects that are alternatives of or similar to Speechtotext Websockets Javascript

Cognitive Face Python
Python SDK for the Microsoft Face API, part of Cognitive Services
Stars: ✭ 226 (+18.32%)
Mutual labels:  cognitive-services, microsoft, sdk
Isomorphic Ws
Isomorphic implementation of WebSocket (https://www.npmjs.com/package/ws)
Stars: ✭ 215 (+12.57%)
Mutual labels:  websocket, websockets, browser
Java Speech Api
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (+156.54%)
Mutual labels:  speech-recognition, speech, recognition
Cognitive Face Ios
iOS SDK for the Microsoft Face API, part of Cognitive Services
Stars: ✭ 191 (+0%)
Mutual labels:  cognitive-services, microsoft, sdk
Cognitive Face Windows
Windows SDK for the Microsoft Face API, part of Cognitive Services
Stars: ✭ 175 (-8.38%)
Mutual labels:  cognitive-services, microsoft, sdk
Cognitive Face Android
Cognitive Services Face client library for Android.
Stars: ✭ 273 (+42.93%)
Mutual labels:  cognitive-services, microsoft, sdk
Julius
Open-Source Large Vocabulary Continuous Speech Recognition Engine
Stars: ✭ 1,258 (+558.64%)
Mutual labels:  speech-recognition, speech, recognition
Avpi
an open source voice command macro software
Stars: ✭ 130 (-31.94%)
Mutual labels:  speech, recognition
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (-30.89%)
Mutual labels:  speech-recognition, speech
Allosaurus
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Stars: ✭ 135 (-29.32%)
Mutual labels:  speech-recognition, speech
Simple Websocket
Simple, EventEmitter API for WebSockets
Stars: ✭ 159 (-16.75%)
Mutual labels:  websocket, browser
Javawebsocketclient
RxJava WebSocket library for Java and Android
Stars: ✭ 188 (-1.57%)
Mutual labels:  websocket, websockets
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (-32.98%)
Mutual labels:  speech-recognition, speech
Bolt Js
A framework to build Slack apps using JavaScript
Stars: ✭ 1,971 (+931.94%)
Mutual labels:  websocket, websockets
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+5738.22%)
Mutual labels:  speech-recognition, speech
Websocket
WSServer is a fast, configurable, and extendable WebSocket Server for UNIX systems written in C (C11).
Stars: ✭ 144 (-24.61%)
Mutual labels:  websocket, websockets
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-35.08%)
Mutual labels:  speech-recognition, speech
Onfido Sdk Ui
The Onfido SDK for Front-end JavaScript
Stars: ✭ 139 (-27.23%)
Mutual labels:  sdk, browser
Ws Scrcpy
Web client prototype for scrcpy.
Stars: ✭ 164 (-14.14%)
Mutual labels:  websocket, browser
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+997.91%)
Mutual labels:  speech-recognition, speech

npm version

February 2019

The version 1.3 of the Cognitive Services Speech SDK is available. The open-source Javascript version can be found here. For other languages and platform check out the Speech SDK home page.

NOTE: This repository is deprecated. Please use the new Cognitive Services Speech SDK!

September 2018: New Microsoft Cognitive Services Speech SDK available

We released a new Speech SDK supporting the new Unified Speech Service. The new Speech SDK comes with support for Windows, Android, Linux, Javascript and iOS.

Please check out Microsoft Cognitive Services Speech SDK for documentation, links to the download pages, and the samples.

NOTE: The content of this repository is supporting the Bing Speech Service, not the new Speech Service. Bing Speech Service has been deprecated, please use the new Speech Service.

Prerequisites

Subscribe to the Speech Recognition API, and get a free trial subscription key

The Speech API is part of Cognitive Services. You can get free trial subscription keys from the Cognitive Services subscription page. After you select the Speech API, select Get API Key to get the key. It returns a primary and secondary key. Both keys are tied to the same quota, so you can use either key.

Note: Before you can use Speech client libraries, you must have a subscription key.

Get started

In this section we will walk you through the necessary steps to load a sample HTML page. The sample is located in our github repository. You can open the sample directly from the repository, or open the sample from a local copy of the repository.

Note: Some browsers block microphone access on un-secure origin. So, it is recommended to host the 'sample'/'your app' on https to get it working on all supported browsers.

Open the sample directly

Acquire a subscription key as described above. Then open the link to the sample. This will load the page into your default browser (Rendered using htmlPreview).

Open the sample from a local copy

To try the sample locally, clone this repository:

git clone https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript

compile the TypeScript sources and bundle/browserfy them into a single JavaScript file (npm needs to be installed on your machine). Change into the root of the cloned repository and run the commands:

cd SpeechToText-WebSockets-Javascript && npm run bundle

Open samples\browser\Sample.html in your favorite browser.

Next steps

Installation of npm package

An npm package of the Microsoft Speech Javascript Websocket SDK is available. To install the npm package run

npm install microsoft-speech-browser-sdk

As a Node module

If you're building a node app and want to use the Speech SDK, all you need to do is add the following import statement:

import * as SDK from 'microsoft-speech-browser-sdk';

and setup the recognizer:

function RecognizerSetup(SDK, recognitionMode, language, format, subscriptionKey) {
    let recognizerConfig = new SDK.RecognizerConfig(
        new SDK.SpeechConfig(
            new SDK.Context(
                new SDK.OS(navigator.userAgent, "Browser", null),
                new SDK.Device("SpeechSample", "SpeechSample", "1.0.00000"))),
        recognitionMode, // SDK.RecognitionMode.Interactive  (Options - Interactive/Conversation/Dictation)
        language, // Supported languages are specific to each recognition mode Refer to docs.
        format); // SDK.SpeechResultFormat.Simple (Options - Simple/Detailed)

    // Alternatively use SDK.CognitiveTokenAuthentication(fetchCallback, fetchOnExpiryCallback) for token auth
    let authentication = new SDK.CognitiveSubscriptionKeyAuthentication(subscriptionKey);

    return SDK.Recognizer.Create(recognizerConfig, authentication);
}

function RecognizerStart(SDK, recognizer) {
    recognizer.Recognize((event) => {
        /*
            Alternative syntax for typescript devs.
            if (event instanceof SDK.RecognitionTriggeredEvent)
        */
        switch (event.Name) {
            case "RecognitionTriggeredEvent" :
                UpdateStatus("Initializing");
                break;
            case "ListeningStartedEvent" :
                UpdateStatus("Listening");
                break;
            case "RecognitionStartedEvent" :
                UpdateStatus("Listening_Recognizing");
                break;
            case "SpeechStartDetectedEvent" :
                UpdateStatus("Listening_DetectedSpeech_Recognizing");
                console.log(JSON.stringify(event.Result)); // check console for other information in result
                break;
            case "SpeechHypothesisEvent" :
                UpdateRecognizedHypothesis(event.Result.Text);
                console.log(JSON.stringify(event.Result)); // check console for other information in result
                break;
            case "SpeechFragmentEvent" :
                UpdateRecognizedHypothesis(event.Result.Text);
                console.log(JSON.stringify(event.Result)); // check console for other information in result
                break;
            case "SpeechEndDetectedEvent" :
                OnSpeechEndDetected();
                UpdateStatus("Processing_Adding_Final_Touches");
                console.log(JSON.stringify(event.Result)); // check console for other information in result
                break;
            case "SpeechSimplePhraseEvent" :
                UpdateRecognizedPhrase(JSON.stringify(event.Result, null, 3));
                break;
            case "SpeechDetailedPhraseEvent" :
                UpdateRecognizedPhrase(JSON.stringify(event.Result, null, 3));
                break;
            case "RecognitionEndedEvent" :
                OnComplete();
                UpdateStatus("Idle");
                console.log(JSON.stringify(event)); // Debug information
                break;
        }
    })
    .On(() => {
        // The request succeeded. Nothing to do here.
    },
    (error) => {
        console.error(error);
    });
}

function RecognizerStop(SDK, recognizer) {
    // recognizer.AudioSource.Detach(audioNodeId) can be also used here. (audioNodeId is part of ListeningStartedEvent)
    recognizer.AudioSource.TurnOff();
}

In a Browser, using Webpack

Currently, the TypeScript code in this SDK is compiled using the default module system (CommonJS), which means that the compilation produces a number of distinct JS source files. To make the SDK usable in a browser, it first needs to be "browserified" (all the javascript sources need to be glued together). Towards this end, this is what you need to do:

  1. Add require statement to you web app source file, for instance (take a look at sample_app.js):

        var SDK = require('<path_to_speech_SDK>/Speech.Browser.Sdk.js');
    
  2. Setup the recognizer, same as above.

  3. Run your web-app through the webpack (see "bundle" task in gulpfile.js, to execute it, run npm run bundle).

  4. Add the generated bundle to your html page:

    <script src="../../distrib/speech.sdk.bundle.js"></script>
    

In a Browser, as a native ES6 module

...in progress, will be available soon

Token-based authentication

To use token-based authentication, please launch a local node server, as described here

Docs

The SDK is a reference implementation for the speech websocket protocol. Check the API reference and Websocket protocol reference for more details.

Browser support

The SDK depends on WebRTC APIs to get access to the microphone and read the audio stream. Most of todays browsers(Edge/Chrome/Firefox) support this. For more details about supported browsers refer to navigator.getUserMedia#BrowserCompatibility

Note: The SDK currently depends on navigator.getUserMedia API. However this API is in process of being dropped as browsers are moving towards newer MediaDevices.getUserMedia instead. The SDK will add support to the newer API soon.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].