All Projects → speechly → api

speechly / api

Licence: MIT license
Speechly public API definitions and generated code

Programming Languages

swift
15916 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to api

Alan Sdk Web
Alan AI Web SDK adds a voice assistant or chatbot to your app. Supports React, Angular, Vue, Ember, JavaScript, Electron.
Stars: ✭ 368 (+2353.33%)
Mutual labels:  voice, speech-recognition
Alan Sdk Pcf
Alan AI Power Apps SDK adds a voice assistant or chatbot to your Microsoft Power Apps project.
Stars: ✭ 128 (+753.33%)
Mutual labels:  voice, speech-recognition
Voice Overlay Ios
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 440 (+2833.33%)
Mutual labels:  voice, speech-recognition
Pocketsphinx Python
Python interface to CMU Sphinxbase and Pocketsphinx libraries
Stars: ✭ 298 (+1886.67%)
Mutual labels:  voice, speech-recognition
Voice Overlay Android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 189 (+1160%)
Mutual labels:  voice, speech-recognition
Alan Sdk Flutter
Alan AI Flutter SDK adds a voice assistant or chatbot to your app.
Stars: ✭ 309 (+1960%)
Mutual labels:  voice, speech-recognition
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+41340%)
Mutual labels:  voice, speech-recognition
spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (+246.67%)
Mutual labels:  voice, speech-recognition
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (+1040%)
Mutual labels:  voice, speech-recognition
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (+873.33%)
Mutual labels:  voice, speech-recognition
Alan Sdk Ionic
Alan AI Ionic SDK adds a voice assistant or chatbot to your app. Supports React, Angular.
Stars: ✭ 287 (+1813.33%)
Mutual labels:  voice, speech-recognition
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (+40%)
Mutual labels:  voice, speech-recognition
Alan Sdk Android
Alan AI Android SDK adds a voice assistant or chatbot to your app. Supports Java, Kotlin.
Stars: ✭ 278 (+1753.33%)
Mutual labels:  voice, speech-recognition
Alan Sdk Ios
Alan AI iOS SDK adds a voice assistant or chatbot to your app. Supports Swift, Objective-C.
Stars: ✭ 318 (+2020%)
Mutual labels:  voice, speech-recognition
download audioset
📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (+253.33%)
Mutual labels:  voice, speech-recognition
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+4120%)
Mutual labels:  voice, speech-recognition
VoiceDictation
迅飞 语音听写 WebAPI - 把语音(≤60秒)转换成对应的文字信息,让机器能够“听懂”人类语言,相当于给机器安装上“耳朵”,使其具备“能听”的功能。
Stars: ✭ 36 (+140%)
Mutual labels:  voice, speech-recognition
react-client
An React client library for Speechly API
Stars: ✭ 71 (+373.33%)
Mutual labels:  voice, speech-recognition
Aimybox Android Assistant
Embeddable custom voice assistant for Android applications
Stars: ✭ 139 (+826.67%)
Mutual labels:  voice, speech-recognition
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (+253.33%)
Mutual labels:  voice, speech-recognition

Real-time automatic speech recognition and natural language understanding tools in one flexible API

Website  |  Docs  |  Discussions  |  Blog  |  Podcast


Speechly API

This repository stores the definitions and generated code for Speechly public APIs.

There are also higher-level client libraries available for selected platforms, which contain microphone and audio management functions, as well as the connection state management that otherwise would be needed separately on top of these definitions. See Speechly Client Libraries for more information about these.

Language Support

Protocol buffers definitions are located in proto/. The actual code generation is done with prototool. The supported languages are:

Protobuf stub generation is pretty easy, so if you need support for a language not in the list, you can always generate the stubs separately.

Make sure to check language-specific READMEs.

Using Speechly API

See the language specific examples in the respective subdirectories for more detailed description about using the generated code. The following describes the basic API flow of a Speechly client, which sends speech to the API and receives results at the same time.

An API Reference is generated from the protobuf source files, which contains detailed documentation about the APIs.

All gRPC connections to Speechly APIs must use secure channels, meaning that the connection is done using TLS encryption. The secure channel should be opened to api.speechly.com:443. This channel can then be used to access all of the APIs.

Login

The first step in connecting to the Speechly API is to call speechly.identity.v2.IdentityAPI and create an access token to use for the future calls.

  • Create a LoginRequest and add:
    • device_id, a device identifier that the API can use to match the microphone acoustic profile
    • either:
      • app_id to select a specific Speechly application to use, or
      • project_id to use a project, containing multiple applications
  • Send the request to speechly.identity.v2.IdentityAPI/Login (the stubs help here)
  • The LoginResponse will contain an access token, and expiry information. A new access token should be fetched before the expiration to prevent unnecessary errors.

Using the Access Token

The IdentityAPI/Login is the only API call which does not require authentication metadata. All other API's require that the access token received from Login is attached to the request metadata with key authorization and value Bearer TOKEN (replace TOKEN with the actual token).

If the token is expired or otherwise invalid, all API calls will terminate with gRPC status code PERMISSION_DENIED. A reason is included in the error details.

The token will expire after a certain amount of time, stated in the LoginResponse message. It is still a good idea to keep the once-received token and reuse it for multiple connections, and refresh it only when it is close to expiration. This will make the API calls as fast as possible.

SLU, Spoken Language Understanding

The speechly.slu.v1.SLU/Stream is used to send audio in, and receive results based on the target Speechly application configuration. An access token from IdentityAPI is required to access the SLU.

A generic example of an SLU connection:

  • Open a bi-directional stream to speechly.slu.v1.SLU/Stream. Remember to include the access token in the stream's metadata.
  • All messages sent to the stream are of type SLURequest and all responses are of type SLUResponse. These are envelopes that will contain different types of data, depending on the situation:
    • Send an SLURequest.config message, describing the audio stream
    • Send an SLURequest.event.START message when the speech stream is started
    • Stream will respond with SLUResponse.started message, containing the audioContext id
    • For every chunk of audio, send it to the stream with SLURequest.audio
    • At the same time, read the stream for responses. As the SLU stream is bidirectional, it will receive data at the same time as it sends data. Refer to the docs to see the meaning of different types of SLUResponse
    • When the speech audio is stopped, send an SLURequest.event.STOP message
    • Stream will respond with SLUResponse.finished event, containing the audioContext id that was finished

The connection can be kept open, but an active speech stream (audioContext) will have a maximum duration of 5 minutes.

Supporting APIs

There are other APIs that can be used to manage Speechly applications. Instead of integrating to these, a quicker alternative is to use the Speechly command. Nevertheless, the APIs are documented and usable, if so required.

gRPC-JSON transcoding support

The Speechly API supports automatic transcoding for HTTP/1.1 REST access with JSON content. This means that gRPC services are also exposed as HTTP, being accessible and usable with any REST toolchain (curl, postman etc). The only exception to this is the SLU API, which is a bidirectional streaming API and cannot be represented in HTTP.

The transcoding is implemented in envoy filter and mostly use the default bindings. To call the IdentityAPI, for example:

curl https://api.speechly.com/speechly.identity.v2.IdentityAPI/Login -d '{"deviceId": "$DEVICEID", "application": {"appId": "$APPID"}}'

and to call an API requiring authorization:

curl https://api.speechly.com/speechly.slu.v1.WLU/Text -H "Authorization: Bearer $TOKEN" -d '{"text": "show python repos"}'

The mapping for transcoding is implemented by generating the descriptor set file, which is located in this repository (speechly_api.pb). This file is also usable in grpcurl to do intelligent type mapping for command line gRPC access.

See also Google's protobuf annotations for transcoding HTTP/JSON to gRPC.

Building and Testing This Repository

The build is done with make and docker.

You can run the build for all languages with make build from the root of this repo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].