Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jinserk → Pytorch Asr

jinserk / Pytorch Asr

Licence: gpl-3.0

ASR with PyTorch

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch speech-recognition resnet speech decoder asr densenet kaldi ctc capsule-network

Projects that are alternatives of or similar to Pytorch Asr

Pytorch Kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Stars: ✭ 2,097 (+1591.13%)

Mutual labels: speech-recognition, speech, asr, kaldi

Pykaldi

A Python wrapper for Kaldi

Stars: ✭ 756 (+509.68%)

Mutual labels: speech-recognition, speech, asr, kaldi

Eesen

The official repository of the Eesen project

Stars: ✭ 738 (+495.16%)

Mutual labels: speech-recognition, asr, kaldi, ctc

Neural sp

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+229.03%)

Mutual labels: speech-recognition, speech, asr, ctc

Vosk Server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

Stars: ✭ 277 (+123.39%)

Mutual labels: speech-recognition, asr, kaldi

Vosk Android Demo

Offline speech recognition for Android with Vosk library.

Stars: ✭ 271 (+118.55%)

Mutual labels: speech-recognition, asr, kaldi

Zamia Speech

Open tools and data for cloudless automatic speech recognition

Stars: ✭ 374 (+201.61%)

Mutual labels: speech-recognition, asr, kaldi

Ctcwordbeamsearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model for TensorFlow.

Stars: ✭ 398 (+220.97%)

Mutual labels: speech-recognition, ctc, decoder

vosk-model-ru-adaptation

No description or website provided.

Stars: ✭ 19 (-84.68%)

Mutual labels: speech-recognition, kaldi, asr

Awesome Kaldi

This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )

Stars: ✭ 393 (+216.94%)

Mutual labels: speech-recognition, speech, kaldi

Vosk Api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Stars: ✭ 1,357 (+994.35%)

Mutual labels: speech-recognition, asr, kaldi

Delta

DELTA is a deep learning based natural language and speech processing platform.

Stars: ✭ 1,479 (+1092.74%)

Mutual labels: speech-recognition, speech, asr

sova-asr

SOVA ASR (Automatic Speech Recognition)

Stars: ✭ 123 (-0.81%)

Mutual labels: speech, speech-recognition, asr

torch-asg

Auto Segmentation Criterion (ASG) implemented in pytorch

Stars: ✭ 42 (-66.13%)

Mutual labels: speech, asr, ctc

Tensorflow end2end speech recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Stars: ✭ 305 (+145.97%)

Mutual labels: speech-recognition, asr, ctc

spokestack-android

Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!

Stars: ✭ 52 (-58.06%)

Mutual labels: speech, speech-recognition, asr

Syn Speech

Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework

Stars: ✭ 57 (-54.03%)

Mutual labels: speech-recognition, speech, asr

kaldi-long-audio-alignment

Long audio alignment using Kaldi

Stars: ✭ 21 (-83.06%)

Mutual labels: speech-recognition, kaldi, asr

opensnips

Open source projects related to Snips https://snips.ai/.

Stars: ✭ 50 (-59.68%)

Mutual labels: speech, kaldi, asr

Athena

an open-source implementation of sequence-to-sequence based speech processing engine

Stars: ✭ 542 (+337.1%)

Mutual labels: speech-recognition, asr, ctc

View All Similar Projects ➔

ASR with PyTorch

This repository maintains an experimental code for speech recognition using PyTorch and Kaldi. We are more focusing on better acoustic model that produce phoneme sequence than end-to-end transcription. For this purpose, the Kaldi latgen decoder is integrated as a PyTorch CppExtension.

The code was tested with Python 3.7 and PyTorch 1.0.0rc1. We have a lot of f-strings, so you must use Python 3.6 or later.

Performance

model	train dataset	dev dataset	test dataset	LER	WER
decoder baseline¹	-	-	swbd rt03	-	1.74%
deepspeech_var	aspire + swbd train	swbd eval2000	swbd rt03	33.73%	37.75%
las	aspire + swbd train	swbd eval2000	swbd rt03

_{^{1. This is the result by engaging the phone label sequences (onehot vectors) into the decoder input.
The result is from < 20-sec utterances, choosing a random pronunciation for words from the lexicon if the words have multiple pronunciations, after
inserting sil phones with prob 0.2 between the words and with prob 0.8 at the beginning and end of the utterances.
please see here with target_test=True. ⏎}}

Installation

Prerequisites:

We recommend pyenv. Do not forget to set pyenv local <python-version> in the local repo if you're using pyenv.

To avoid the -fPIC related compile error, you have to configure Kaldi with --shared option when you install it.

Install dependent packages:

$ sudo apt install sox libsox-dev

Download:

$ git clone https://github.com/jinserk/pytorch-asr.git

Install required Python modules:

$ cd pytorch-asr
$ pip install -r requirements.txt

If you have an installation error of torchaudio on a CentOS machine, add the followings to your ~/.bashrc.

export CPLUS_INCLUDE_PATH=/usr/include/sox:$CPLUS_INCLUDE_PATH

don't forget to do $ source ~/.bashrc before you try to install the requirements.

Modify the Kaldi path in _path.py:

$ cd asr/kaldi
$ vi _path.py

KALDI_ROOT = <kaldi-installation-path>

Build up PyTorch-binding of Kaldi decoder:

$ python setup.py install

This takes a while to download the Kaldi's official ASpIRE chain model and its post-processing. If you want to use your own language model or graphs, modify asr/kaldi/scripts/mkgraph.sh according to your settings. The binding install method has been changed to use PyTorch's CppExtension, instead of ffi. This will install a package named torch_asr._latgen_lib.

Training

Pytorch-asr is targeted to develop a framework supporting multiple acoustic models. You have to specify one of the models to train or predict. Currently, the deepspeech_ctc model is only maintained from the frequent updated training and prediction modules. Try this model first. We'll follow up the other models for the updated interface soon. Sorry for your inconvenience.

If you do training for the first time, you need to preprocess the dataset. Currently we utilize the contents of data directory in Kaldi's recipe directories that are containing preprocessed corpus data. You need to run the preparation script in each Kaldi recipe before doing the followings. Now we support the Kaldi's aspire, swbd, and tedlium recipes. You will need LDC's corpora to use aspire and swbd datasets. Please modify RECIPE_PATH variable in asr/datasets/*.py first according to the location of your Kaldi setup.

$ python prepare.py aspire <data-path>

Start a new training with:

$ python train.py <model-name> --use-cuda

check --help option to see which parameters are available for the model.

If you want to resume training from a saved model file:

$ python train.py <model-name> --use-cuda --continue-from <model-file>

You can use --visdom option to see the loss propagation. Please make sure that you already have a running visdom process before you start a training with --visdom option. --tensorboard option is outdated since TensorboardX package doesn't support the latest PyTorch.

You can also use --slack option to redirect logs to slack DM. If you want to use this, first setup a slack workplace and add "Bots" app to the workplace. You must obtain the Bots' token and your id from the slack setting. Then set environment variables SLACK_API_TOKEN and SLACK_API_USER for each of them.

Prediction

You can predict a sample with trained model file:

$ python predict.py <model-name> --continue-from <model-file> <target-wav-file1> <target-wav-file2> ...

Acknowledgement

Some models are imported from the following projects. We appreciate all their work and all right of the codes belongs to them.

DeepSpeech : https://github.com/SeanNaren/deepspeech.pytorch.git
ResNet : https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
DenseNet : https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 124

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗