Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

FFMediaToolkit is a cross-platform video decoder/encoder library for .NET that uses FFmpeg native libraries. It supports video frames extraction, reading stream metadata and creating videos from bitmaps in any format supported by FFmpeg.

Stars: ✭ 156 (-3.11%)

Mutual labels: decoder

Salsanext

Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

Stars: ✭ 153 (-4.97%)

Mutual labels: decoder

Wav

Battle tested Wav decoder/encoder

Stars: ✭ 139 (-13.66%)

Mutual labels: decoder

Tensorflow Ctc Speech Recognition

Application of Connectionist Temporal Classification (CTC) for Speech Recognition (Tensorflow 1.0 but compatible with 2.0).

Stars: ✭ 127 (-21.12%)

Mutual labels: ctc

Tensorflowasr

集成了Tensorflow 2版本的端到端语音识别模型，并且RTF(实时率)在0.1左右/Mandarin State-of-the-art Automatic Speech Recognition in Tensorflow 2

Stars: ✭ 145 (-9.94%)

Mutual labels: ctc

Py Kaldi Asr

Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.

Stars: ✭ 156 (-3.11%)

Mutual labels: kaldi

Irremoteesp8266

Infrared remote library for ESP8266/ESP32: send and receive infrared signals with multiple protocols. Based on: https://github.com/shirriff/Arduino-IRremote/

Stars: ✭ 1,964 (+1119.88%)

Mutual labels: decoder

Htmlstring

Escape and unescape HTML entities in Swift

Stars: ✭ 119 (-26.09%)

Mutual labels: decoder

Eend

End-to-End Neural Diarization

Stars: ✭ 153 (-4.97%)

Mutual labels: kaldi

Silk V3 Decoder

kn007's blog

Stars: ✭ 1,832 (+1037.89%)

Mutual labels: decoder

Swift Html Entities

HTML5 spec-compliant character encoder/decoder for Swift

Stars: ✭ 130 (-19.25%)

Mutual labels: decoder

Cityengine Sdk

CityEngine is a 3D city modeling software for urban design, visual effects, and VR/AR production. With its C++ SDK you can create plugins and standalone apps capable to execute CityEngine CGA procedural modeling rules.

Stars: ✭ 137 (-14.91%)

Mutual labels: decoder

View All Similar Projects ➔

Update:

Update to pytorch1.2 and python3.

CTC-based Automatic Speech Recogniton

This is a CTC-based speech recognition system with pytorch.

At present, the system only supports phoneme recognition.

You can also do it at word-level and may get a high error rate.

Another way is to decode with a lexcion and word-level language model using WFST which is not included in this system.

Data

English Corpus: Timit

Training set: 3696 sentences(exclude SA utterance)
Dev set: 400 sentences
Test set: 192 sentences

Chinese Corpus: 863 Corpus

Training set:

Speaker	UtterId	Utterances
M50, F50	A1-A521, AW1-AW129	650 sentences
M54, F54	B522-B1040,BW130-BW259	649 sentences
M60, F60	C1041-C1560 CW260-CW388	649 sentences
M64, F64	D1-D625	625 sentences
All		5146 sentences

Test set:

Speaker	UtterId	Utterances
M51, F51	A1-A100	100 sentences
M55, F55	B522-B521	100 sentences
M61, F61	C1041-C1140	100 sentences
M63, F63	D1-D100	100 sentences
All		800 sentences

Install

Install Pytorch
~~Install warp-ctc and bind it to pytorch.~~
~~Notice: If use python2, reinstall the pytorch with source code instead of pip.~~ Use pytorch1.2 built-in CTC function(nn.CTCLoss) Now.
Install Kaldi. We use kaldi to extract mfcc and fbank.
Install pytorch torchaudio(This is needed when using waveform as input).
~~Install KenLM. Training n-gram Languange Model if needed~~. Use Irstlm in kaldi tools instead.
Install and start visdom

pip3 install visdom
python -m visdom.server

Install other python packages

pip install -r requirements.txt

Usage

Install all the packages according to the Install part.
Revise the top script run.sh.
Open the config file to revise the super-parameters about everything.
Run the top script with four conditions

bash run.sh    data_prepare + AM training + LM training + testing
bash run.sh 1  AM training + LM training + testing
bash run.sh 2  LM training + testing
bash run.sh 3  testing

RNN LM training is not implemented yet. They are added to the todo-list.

Data Prepare

Extract 39dim mfcc and 40dim fbank feature from kaldi.
Use compute-cmvn-stats and apply-cmvn with training data to get the global mean and variance and normalize the feature.
Rewrite Dataset and dataLoader in torch.nn.dataset to prepare data for training. You can find them in the steps/dataloader.py.

Model

RNN + DNN + CTC RNN here can be replaced by nn.LSTM and nn.GRU
CNN + RNN + DNN + CTC
CNN is use to reduce the variety of spectrum which can be caused by the speaker and environment difference.
How to choose
Use add_cnn to choose one of two models. If add_cnn is True, then CNN+RNN+DNN+CTC will be chosen.

Training:

initial-lr = 0.001
decay = 0.5
wight-decay = 0.005

Adjust the learning rate if the dev loss is around a specific loss for ten times.
Times of adjusting learning rate is 8 which can be alter in steps/train_ctc.py(line367).
Optimizer is nn.optimizer.Adam with weigth decay 0.005

Decoder

Greedy decoder:

Take the max prob of outputs as the result and get the path.
Calculate the WER and CER by used the function of the class.

Beam decoder:

Implemented with python. Original Code
I fix it to support phoneme for batch decode.
Beamsearch can improve about 0.2% of phonome accuracy.
Phoneme-level language model is inserted to beam search decoder now.

ToDo

Combine with RNN-LM
Beam search with RNN-LM
The code in 863_corpus is a mess. Need arranged.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 161

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗