All Projects → freewym → Espresso

freewym / Espresso

Licence: other
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Espresso

Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (-6.44%)
Mutual labels:  speech-recognition, asr, kaldi
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (-8.66%)
Mutual labels:  speech-recognition, asr, kaldi
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (-76.49%)
Mutual labels:  speech-recognition, asr, end-to-end
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (-62.25%)
Mutual labels:  speech-recognition, asr, end-to-end
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-97.4%)
Mutual labels:  speech-recognition, kaldi, asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (-78.34%)
Mutual labels:  speech-recognition, asr, end-to-end
Vosk Android Demo
Offline speech recognition for Android with Vosk library.
Stars: ✭ 271 (-66.46%)
Mutual labels:  speech-recognition, asr, kaldi
Speech To Text Russian
Проект для распознавания речи на русском языке на основе pykaldi.
Stars: ✭ 151 (-81.31%)
Mutual labels:  speech-recognition, asr, kaldi
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (-87.13%)
Mutual labels:  speech-recognition, kaldi, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-97.52%)
Mutual labels:  end-to-end, speech-recognition, asr
Zamia Speech
Open tools and data for cloudless automatic speech recognition
Stars: ✭ 374 (-53.71%)
Mutual labels:  speech-recognition, asr, kaldi
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-96.91%)
Mutual labels:  end-to-end, speech-recognition, asr
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+159.53%)
Mutual labels:  speech-recognition, asr, kaldi
Vosk Server
WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Stars: ✭ 277 (-65.72%)
Mutual labels:  speech-recognition, asr, kaldi
Py Kaldi Asr
Some simple wrappers around kaldi-asr intended to make using kaldi's (online) decoders as convenient as possible.
Stars: ✭ 156 (-80.69%)
Mutual labels:  speech-recognition, asr, kaldi
Zeroth
Kaldi-based Korean ASR (한국어 음성인식) open-source project
Stars: ✭ 248 (-69.31%)
Mutual labels:  speech-recognition, asr, kaldi
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (-85.89%)
Mutual labels:  speech-recognition, asr, end-to-end
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-84.65%)
Mutual labels:  speech-recognition, asr, kaldi
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (-43.56%)
Mutual labels:  end-to-end, speech-recognition, asr
vosk-model-ru-adaptation
No description or website provided.
Stars: ✭ 19 (-97.65%)
Mutual labels:  speech-recognition, kaldi, asr

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

  • June 2020: Transformer recipes released.
  • April 2020: Both E2E LF-MMI (using PyChain) and Cross-Entropy training for hybrid ASR are now supported. WSJ recipes are provided here and here as examples, respectively.
  • March 2020: SpecAugment is supported and relevant recipes are released.
  • September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

  • PyTorch version >= 1.5.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • To install Espresso from source and develop locally:
git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
pip install kaldi_io
pip install sentencepiece
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. Kaldi is required for data preparation, feature extraction, scoring for some datasets (e.g., Switchboard), and decoding for all hybrid systems.

  • If you want to use PyChain for LF-MMI training, you also need to install PyChain (and OpenFst):

edit PYTHON_DIR variable in espresso/tools/Makefile (default: ~/anaconda3/bin), and then

cd espresso/tools; make openfst pychain
  • For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu 
            and Shuoyang Ding and Hang Lv and Yiwen Shao 
            and Nanyun Peng and Lei Xie and Shinji Watanabe 
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].