All Projects → iamjanvijay → rnnt_decoder_cuda

iamjanvijay / rnnt_decoder_cuda

Licence: MIT license
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

Programming Languages

Cuda
1817 projects
C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to rnnt decoder cuda

Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+408.33%)
Mutual labels:  speech-recognition, beam-search, speech-to-text
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+241.67%)
Mutual labels:  speech-recognition, speech-to-text
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+48.33%)
Mutual labels:  speech-recognition, speech-to-text
web-speech-cognitive-services
Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.
Stars: ✭ 35 (-41.67%)
Mutual labels:  speech-recognition, speech-to-text
anycontrol
Voice control for your websites and applications
Stars: ✭ 53 (-11.67%)
Mutual labels:  speech-recognition, speech-to-text
revai-python-sdk
Rev AI Python SDK
Stars: ✭ 35 (-41.67%)
Mutual labels:  speech-recognition, speech-to-text
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-11.67%)
Mutual labels:  speech-recognition, speech-to-text
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+303.33%)
Mutual labels:  speech-recognition, speech-to-text
web-voice-processor
A library for real-time voice processing in web browsers
Stars: ✭ 69 (+15%)
Mutual labels:  speech-recognition, speech-to-text
octopus
On-device speech-to-index engine powered by deep learning.
Stars: ✭ 30 (-50%)
Mutual labels:  speech-recognition, speech-to-text
React.ai
It recognize your speech and trained AI Bot will respond(i.e Customer Service, Personal Assistant) using Machine Learning API (DialogFlow, apiai), Speech Recognition, GraphQL, Next.js, React, redux
Stars: ✭ 38 (-36.67%)
Mutual labels:  speech-recognition, speech-to-text
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-58.33%)
Mutual labels:  speech-recognition, speech-to-text
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-65%)
Mutual labels:  speech-recognition, speech-to-text
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+490%)
Mutual labels:  speech-recognition, speech-to-text
Speech recognition with tensorflow
Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.
Stars: ✭ 253 (+321.67%)
Mutual labels:  speech-recognition, speech-to-text
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+198.33%)
Mutual labels:  speech-recognition, speech-to-text
speechrec
a simple speech recognition app using the Web Speech API Interfaces
Stars: ✭ 18 (-70%)
Mutual labels:  speech-recognition, speech-to-text
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (+266.67%)
Mutual labels:  speech-recognition, speech-to-text
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+6041.67%)
Mutual labels:  speech-recognition, speech-to-text
revai-java-sdk
Rev.ai Java SDK
Stars: ✭ 16 (-73.33%)
Mutual labels:  speech-recognition, speech-to-text

RNN-Transducer Prefix Beam Search

This repository provides an optimised implementation of prefix beam search for RNN-Tranducer loss function (as described in "Sequence Transduction with Recurrent Neural Networks" paper). This implementation takes ~100 milliseconds for a speech segment of ~5 seconds and beam size of 10 (beam size of 10 is adequate for production level error rates).

Sample Run

To execute a sample run of prefix beam search on your machine, execute the following commands:

  1. Clone this repository.
git clone https://github.com/iamjanvijay/rnnt_decoder_cuda.git;
  1. Clean the output folder.
rm rnnt_decoder_cuda/data/outputs/*;
  1. Make the deocder object file.
cd rnnt_decoder_cuda/decoder;
make clean;
make;
  1. Execute the decoder - decoded beams will be saved to data/output folder.
CUDA_VISIBLE_DEVICES=0 ./decoder ../data/inputs/metadata.txt 0 9 10 5001;
CUDA_VISIBLE_DEVICES=$GPU_ID$ ./decoder ../data/inputs/metadata.txt $index_of_first_file_to_read_from_metadata$ $index_of_last_file_to read_from_metadata$ $beam_size$ $vocabulary_size_excluding_blank$;

Contributing

Contributions are welcomed and greatly appreciated.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].