Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → githubharald → Ctcdecoder

githubharald / Ctcdecoder

Licence: mit

Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

speech-recognition recurrent-neural-networks opencl language-model ctc beam-search

Projects that are alternatives of or similar to Ctcdecoder

Ctcwordbeamsearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model for TensorFlow.

Stars: ✭ 398 (-24.76%)

Mutual labels: speech-recognition, recurrent-neural-networks, language-model, ctc

Tensorflow end2end speech recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Stars: ✭ 305 (-42.34%)

Mutual labels: beam-search, speech-recognition, ctc

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (-22.87%)

Mutual labels: speech-recognition, language-model, ctc

Rnn ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Stars: ✭ 220 (-58.41%)

Mutual labels: speech-recognition, recurrent-neural-networks, ctc

rnnt decoder cuda

An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

Stars: ✭ 60 (-88.66%)

Mutual labels: speech-recognition, beam-search

ctc-asr

End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.

Stars: ✭ 112 (-78.83%)

Mutual labels: speech-recognition, ctc

mongolian-nlp

Useful resources for Mongolian NLP

Stars: ✭ 119 (-77.5%)

Mutual labels: speech-recognition, language-model

Rus-SpeechRecognition-LSTM-CTC-VoxForge

Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge

Stars: ✭ 50 (-90.55%)

Mutual labels: speech-recognition, ctc

tensorflow-with-kenlm

Tensorflow with KenLM integrated for beam search scoring

Stars: ✭ 30 (-94.33%)

Mutual labels: beam-search, language-model

rindow-neuralnetworks

Neural networks library for machine learning on PHP

Stars: ✭ 37 (-93.01%)

Mutual labels: opencl, recurrent-neural-networks

Zamia Speech

Open tools and data for cloudless automatic speech recognition

Stars: ✭ 374 (-29.3%)

Mutual labels: speech-recognition, language-model

Tensorflowasr

⚡️ TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Stars: ✭ 400 (-24.39%)

Mutual labels: speech-recognition, ctc

rnn benchmarks

RNN benchmarks of pytorch, tensorflow and theano

Stars: ✭ 85 (-83.93%)

Mutual labels: recurrent-neural-networks, ctc

TF-Speech-Recognition-Challenge-Solution

Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.

Stars: ✭ 58 (-89.04%)

Mutual labels: recurrent-neural-networks, speech-recognition

PCPM

Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.

Stars: ✭ 21 (-96.03%)

Mutual labels: speech-recognition, language-model

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (-76.18%)

Mutual labels: beam-search, recurrent-neural-networks

Zeroth

Kaldi-based Korean ASR (한국어 음성인식) open-source project

Stars: ✭ 248 (-53.12%)

Mutual labels: speech-recognition, language-model

Lingvo

Stars: ✭ 2,361 (+346.31%)

Mutual labels: speech-recognition, language-model

Tf chatbot seq2seq antilm

Seq2seq chatbot with attention and anti-language model to suppress generic response, option for further improve by deep reinforcement learning.

Stars: ✭ 369 (-30.25%)

Mutual labels: beam-search, language-model

Asrt speechrecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Stars: ✭ 4,943 (+834.4%)

Mutual labels: speech-recognition, ctc

View All Similar Projects ➔

CTC Decoding Algorithms with Language Model

Connectionist Temporal Classification (CTC) decoding algorithms are implemented as Python scripts. A minimalistic Language Model (LM) is provided.

Run demo

Go to the src/ directory and run the script python main.py. Appending the command line parameter gpu additionally executes best path decoding on the GPU.

Expected results:

=====Mini example=====
TARGET       : "a"
BEST PATH    : ""
PREFIX SEARCH: "a"
BEAM SEARCH  : "a"
TOKEN        : "a"
PROB(TARGET) : 0.64
LOSS(TARGET) : 0.4462871026284195
=====Word example=====
TARGET        : "aircraft"
BEST PATH     : "aircrapt"
LEXICON SEARCH: "aircraft"
=====Line example=====
TARGET        : "the fake friend of the family, like the"
BEST PATH     : "the fak friend of the fomly hae tC"
PREFIX SEARCH : "the fak friend of the fomcly hae tC"
BEAM SEARCH   : "the fak friend of the fomcly hae tC"
BEAM SEARCH LM: "the fake friend of the family, lie th"
TOKEN         : "the fake friend of the family fake the"
PROB(TARGET)  : 6.314726428865645e-13
LOSS(TARGET)  : 28.090721774903226
=====Line example (GPU)=====
BestPathCL.compute(...) time:  0.04680013656616211
Compute for 1000 batch elements
TARGET        : "the fake friend of the family, like the"
BEST PATH GPU : "the fak friend of the fomly hae tC"

Provided algorithms

Best Path Decoding: takes best label per time-step to compute best path, then removes repeated labels and CTC-blanks from this path. File: BestPath.py for CPU implementation and BestPathCL.py/BestPathCL.cl for GPU implementation [1]
Prefix Search Decoding: best-first search through tree of labelings. File: PrefixSearch.py [1]
Beam Search Decoding: iteratively searches for best labeling in a tree of labelings, optionally uses a character-level LM. File: BeamSearch.py [2] [5]
Token Passing: searches for most probable word sequence. The words are constrained to those contained in a dictionary. Can be extended to use a word-level LM. File: TokenPassing.py [1]
Lexicon Search: computes approximation with best path decoding to find similar words in dictionary. Returns the one with highest score. File: LexiconSearch.py [3]
Loss: calculates probability and loss of a given text in the RNN output. File: Loss.py [1] [6]
Word Beam Search: TensorFlow implementation see repository CTCWordBeamSearch [8]

Choosing the right algorithm

This paper [7] compares beam search decoding and token passing. It gives suggestions when to use best path decoding, beam search decoding and token passing.

Testcases

The RNN output matrix of the Mini example testcase contains 2 time-steps (t0 and t1) and 3 labels (a, b and - representing the CTC-blank). Best path decoding (see left figure) takes the most probable label per time-step which gives the path "--" and therefore the recognized text "" with probability 0.6*0.6=0.36. Beam search, prefix search and token passing calculate the probability of labelings. For the labeling "a" these algorithms sum over the paths "-a", "a-" and "aa" (see right figure) with probability 0.6*0.4+0.4*0.6+0.4*0.4=0.64. The only path which gives "" still has probability 0.36, therefore "a" is the result returned by beam search, prefix search and token passing.

The Word example testcase contains a single word from the IAM Handwriting Database [4]. It is used to test lexicon search [3]. RNN output was generated with the SimpleHTR model (by using the --dump option). Lexicon search first computes an approximation with best path decoding, then searches for similar words in a dictionary using a BK tree, and finally scores them by computing the loss and returning the most probable dictionary word. Best path decoding outputs "aircrapt", lexicon search is able to find similar words like "aircraft" and "airplane" in the dictionary, calculates a score for each of them and finally returns "aircraft", which is the correct result. The figure below shows the input image and the RNN output matrix with 32 time-steps and 80 classes (the last one being the CTC-blank). Each column sums to 1 and each entry represents the probability of seeing a label at a given time-step.

The ground-truth text of the Line example testcase is "the fake friend of the family, like the" and is a sample from the IAM Handwriting Database [4]. This test case is used to test all algorithms except lexicon search. RNN output was generated by a partially trained TensorFlow model inspired by CRNN [3] which essentially is a larger version of the SimpleHTR model. The figure below shows the input image and the RNN output matrix with 100 time-steps and 80 classes.

Data files

The data files for the Word example are located in data/word and the files for the Line example in data/line. Each of these directories contains:

rnnOutput.csv: output of RNN layer (softmax not yet applied), which contains 32 or 100 time-steps and 80 label scores per time-step.
corpus.txt: the text from which the language model is generated.
img.png: the input image of the neural network. It is contained as an illustration, however, the decoding algorithms do not use it.

Notes

The provided Python scripts are intended for tests and experiments. For productive use I recommend implementing these algorithms in C++ (for performance reasons). A C++ implementation can easily be integrated into deeplearning-frameworks such as TensorFlow (see CTCWordBeamSearch for an example).

A GPU implementation is provided for best path decoding which requires pyopencl installed and executing python main.py gpu.

References

[1] Graves - Supervised sequence labelling with recurrent neural networks

[2] Hwang - Character-level incremental speech recognition with recurrent neural networks

[3] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

[4] Marti - The IAM-database: an English sentence database for offline handwriting recognition

[5] Beam Search Decoding in CTC-trained Neural Networks

[6] An Intuitive Explanation of Connectionist Temporal Classification

[7] Scheidl - Comparison of Connectionist Temporal Classification Decoding Algorithms

[8] Scheidl - Word Beam Search: A Connectionist Temporal Classification Decoding Algorithm

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 529

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗