All Projects → Picovoice → Cheetah

Picovoice / Cheetah

Licence: other
On-device streaming speech-to-text engine powered by deep learning

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Cheetah

Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+254.31%)
Mutual labels:  raspberry-pi, speech-recognition, speech-to-text, asr, voice-recognition, offline
Rhino
On-device speech-to-intent engine powered by deep learning
Stars: ✭ 406 (+6.01%)
Mutual labels:  raspberry-pi, speech-recognition, speech-to-text, voice-recognition, iot
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (-7.57%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text, asr
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-94.52%)
Mutual labels:  offline, voice-recognition, speech-recognition, speech-to-text
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-92.95%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text, asr
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-73.11%)
Mutual labels:  raspberry-pi, speech-recognition, speech-to-text, voice-recognition
Speech To Text Benchmark
speech to text benchmark framework
Stars: ✭ 481 (+25.59%)
Mutual labels:  speech-recognition, speech-to-text, voice-recognition, offline
Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (-55.35%)
Mutual labels:  raspberry-pi, speech-recognition, speech-to-text, iot
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-86.16%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text, asr
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-93.47%)
Mutual labels:  speech-recognition, speech-to-text, asr
AmazonSpeechTranslator
End-to-end Solution for Speech Recognition, Text Translation, and Text-to-Speech for iOS using Amazon Translate and Amazon Polly as AWS Machine Learning managed services.
Stars: ✭ 50 (-86.95%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+119.58%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text
Vosk Android Demo
Offline speech recognition for Android with Vosk library.
Stars: ✭ 271 (-29.24%)
Mutual labels:  speech-recognition, asr, offline
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-94.52%)
Mutual labels:  speech-recognition, speech-to-text, asr
octopus
On-device speech-to-index engine powered by deep learning.
Stars: ✭ 30 (-92.17%)
Mutual labels:  voice-recognition, speech-recognition, speech-to-text
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-20.37%)
Mutual labels:  speech-recognition, speech-to-text, asr
picovoice
The end-to-end platform for building voice products at scale
Stars: ✭ 316 (-17.49%)
Mutual labels:  offline, voice-recognition, speech-recognition
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-94.52%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (-86.42%)
Mutual labels:  speech-recognition, speech-to-text, asr
Deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Stars: ✭ 18,680 (+4777.28%)
Mutual labels:  speech-recognition, speech-to-text, offline

Cheetah

Made in Vancouver, Canada by Picovoice

Cheetah is an on-device streaming Speech-to-Text engine. Cheetah is:

  • offline: runs locally without an Internet connection.
  • highly-accurate [1].
  • compact and computationally-efficient [1].
  • cross-platform. Linux (x86_64), Mac (x86_64), Windows (x86_64), web browsers, Android, iOS, Raspberry Pi, and BeagleBone are supported. Linux (x86_64) is available for personal and non-commercial use free of charge. Other platforms are only available under the commercial license.
  • customizable. Allows adding new words and adapting to different contexts (Available only under the commercial license).

Table of Contents

License

This repository is provided for personal & non-commercial use only. Refer to LICENSE for details. If you wish to use Cheetah in a commercial product, contact Picovoice.

Use Cases

Cheetah is meant to be used for open-domain transcription applications that require live user feedback (incremental transcription results). Open-Domain question answering, voice typing, and meeting note-taking are a few examples.

  • If real-time feedback (incremental transcription results) is not needed, see Leopard.
  • If you need to understand naturally-spoken (complex) commands within a specific domain, see Rhino.
  • If you need to recognize a fixed set of simple voice commands, or activate a device using voice, see Porcupine.

Structure of Repository

Cheetah is shipped as an ANSI C shared library. The binary files for supported platforms are located under lib and header files are at include. Bindings are available at binding to facilitate usage from higher-level languages/platforms. Demo applications are at demo. When possible, use one of the demo applications as a starting point for your own implementation. Finally, resources is a placeholder for data used by various applications within the repository.

Picovoice Console and License File

In order to run, Cheetah requires a valid license file ('.lic' extension). To obtain a time-limited evaluation license file, visit Picovoice Console. To obtain a commercial license, contact Picovoice.

Running Demo Applications

Python Demo Application

First, make sure you have Python 3 installed along with the necessary packages included in the requirements.txt file:

python --version
Python 3.6.8

pip3 install -r requirements.txt

File-Based Demo

The demo transcribes a set of audio files provided as command line arguments. The demo has been tested using Python 3.6. Note that the files need to be single-channel, 16KHz, and 16-bit linearly-encoded. For more information about audio requirements refer to pv_cheetah.h. The following transcribes the WAV file located in the resource directory.

python demo/python/cheetah_demo.py --audio_paths resources/audio_samples/test.wav --license_path ${PATH_TO_YOUR_CHEETAH_LICENSE_FILE}

In order to transcribe multiple files concatenate their paths using comma as below.

python demo/python/cheetah_demo.py --audio_paths ${PATH_TO_AUDIO_FILE_1},${PATH_TO_AUDIO_FILE_2},${PATH_TO_AUDIO_FILE_3} --license_path ${PATH_TO_YOUR_CHEETAH_LICENSE_FILE}

Realtime Demo

This Cheetah demo records audio from the microphone and transcribes it in real-time:

python demo/python/cheetah_demo_realtime.py --license_path ${PATH_TO_YOUR_CHEETAH_LICENSE_FILE}

Note: you need to have a working microphone and it needs to be set as the default audio capture device on your computer for the demo to function correctly.

C Demo Application

File-Based Demo

This demo application accepts a set of WAV files as input and returns their transcripts. Note that the demo expects the audio files to be WAV, 16KHz, and 16-bit linearly-encoded. It does not perform any verification to assure the compatibility or correctness of the input audio files. Set the current working directory to the root of the repository.

The demo can be built using gcc as below.

gcc -I include/ -O3 demo/c/cheetah_demo.c -ldl -o cheetah_demo

The usage can be attained by:

./cheetah_demo

Then it can be used as follows:

./cheetah_demo \
./lib/linux/x86_64/libpv_cheetah.so \
./lib/common/acoustic_model.pv \
./lib/common/language_model.pv \
${PATH_TO_YOUR_CHEETAH_LICENSE_FILE} \
./resources/audio_samples/test.wav

In order to transcribe multiple files, append the absolute path to each additional file to the list of command line arguments as follows:

./cheetah_demo \
./lib/linux/x86_64/libpv_cheetah.so \
./lib/common/acoustic_model.pv \
./lib/common/language_model.pv \
${PATH_TO_YOUR_CHEETAH_LICENSE_FILE} \
${PATH_TO_AUDIO_FILE_1} ${PATH_TO_AUDIO_FILE_2} ${PATH_TO_AUDIO_FILE_3}

Realtime Demo

This demo records the input audio from a microphone and transcribes in real-time. Running the command from root of the repository, the demo can be built using gcc:

gcc -I include/ -O3 demo/c/cheetah_demo_realtime.c -ldl -lasound -o cheetah_demo_realtime

The usage can be attained by

./cheetah_demo_realtime

It can be used as follows

./cheetah_demo_realtime \
lib/linux/x86_64/libpv_cheetah.so \
AUDIO_DEVICE_NAME \
lib/common/acoustic_model.pv \
lib/common/language_model.pv \
${PATH_TO_YOUR_CHEETAH_LICENSE_FILE}

The AUDIO_DEVICE_NAME parameter for the microphone can be found using

arecord -L

Note: you need to have a working microphone.

Integration

Python

cheetah.py provides a Python binding for Cheetah library. Below is a quick demonstration of how to construct an instance of it.

library_path = ... # The file is available under lib/linux/x86_64/libpv_cheetah.so
acoustic_model_path = ... # The file is available under lib/common/acoustic_model.pv
language_model_path = ... # The file is available under lib/common/language_model.pv
license_path = ... # The .lic file is available from Picovoice Console (https://picovoice.ai/console/)

handle = Cheetah(library_path, acoustic_model_path, language_model_path, license_path)

When initialized, valid sample rate can be obtained using handle.sample_rate. Expected frame length (number of audio samples in each input array) is handle.frame_length.

audio = ... # audio data to be transcribed

num_frames = len(audio) / handle.frame_length

transcript = ''

for i in range(num_frames):
    frame = [i * handle.frame_length:(i + 1) * handle.frame_length]
    partial_transcript, _ = handle.process(frame)
    transcript += partial_transcript

transcript += handle.flush()

When finished, release the acquired resources.

handle.delete()

C

Cheetah is implemented in ANSI C and therefore can be directly linked to C applications. pv_cheetah.h header file contains relevant information. An instance of the Cheetah object can be constructed as follows:

const char *acoustic_model_path = ... // The file is available under lib/common/acoustic_model.pv
const char *language_model_path = ... // The file is available under lib/common/language_model.pv
const char *license_path = ... // The .lic file is available from Picovoice Console (https://picovoice.ai/console/)
const int32_t endpoint_duration_sec = ... // endpoint duration in seconds. set to '-1' to disable endpointing

pv_cheetah_t *handle;
const pv_status_t status = pv_cheetah_init(
    acoustic_model_path,
    language_model_path,
    license_path,
    endpoint_duration_sec,
    &handle);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Now the handle can be used to process incoming audio stream. Cheetah accepts single-channel 16-bit PCM audio. The sample rate can be retrieved using pv_sample_rate(). Finally, Cheetah accepts input audio in consecutive chunks (aka frames); the length of each frame can be retrieved using pv_cheetah_frame_length().

const int16_t *audio = ... // audio data to be transcribed
const int audio_length = ... // number of samples in audio

const int num_frames = audio_length / pv_cheetah_frame_length();

char *transcript = ... // buffer for storing transcription.

for (int i = 0; i < num_frames; i++) {
    const int16_t *frame = &audio[i * pv_cheetah_frame_length()];

    char *partial_transcript;
    bool is_endpoint; // is updated only if endpoint detection is enabled at construction time.
    const pv_status_t status = pv_cheetah_process(handle, frame, &partial_transcript, &is_endpoint);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }

    strcat(transcript, partial_transcript);
    free(partial_transcript);
}

char *final_transcript;
const pv_status_t status = pv_cheetah_flush(handle, &final_transcript)
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

strcat(transcript, final_transcript);
free(final_transcript);

Finally, when done be sure to release resources acquired.

pv_cheetah_delete(handle);

Releases

V1.2.0 — January 14th, 2020

  • Improved accuracy
  • Runtime optimizations

V1.1.0 — September 2nd, 2019

  • Real-time decoding
  • Improved accuracy
  • Runtime optimizations

V1.0.0 — October 30th, 2018

  • Initial release.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].