All Projects → sooftware → lightning-asr

sooftware / lightning-asr

Licence: MIT license
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to lightning-asr

wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+6522.22%)
Mutual labels:  speech-recognition, asr, conformer
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+1166.67%)
Mutual labels:  speech-recognition, asr, conformer
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (+44.44%)
Mutual labels:  speech-recognition, asr
ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (+211.11%)
Mutual labels:  speech-recognition, asr
vosk-model-ru-adaptation
No description or website provided.
Stars: ✭ 19 (-47.22%)
Mutual labels:  speech-recognition, asr
deepaudio-speaker
neural network based speaker embedder
Stars: ✭ 19 (-47.22%)
Mutual labels:  hydra, pytorch-lightning
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-30.56%)
Mutual labels:  speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-44.44%)
Mutual labels:  speech-recognition, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+469.44%)
Mutual labels:  speech-recognition, asr
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
Stars: ✭ 104 (+188.89%)
Mutual labels:  speech-recognition, asr
lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
Stars: ✭ 551 (+1430.56%)
Mutual labels:  hydra, pytorch-lightning
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-41.67%)
Mutual labels:  speech-recognition, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+47.22%)
Mutual labels:  speech-recognition, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-41.67%)
Mutual labels:  speech-recognition, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+397.22%)
Mutual labels:  speech-recognition, asr
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚡🔥⚡
Stars: ✭ 1,905 (+5191.67%)
Mutual labels:  hydra, pytorch-lightning
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-41.67%)
Mutual labels:  speech-recognition, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+883.33%)
Mutual labels:  speech-recognition, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-41.67%)
Mutual labels:  speech-recognition, asr
syn-speech-samples
An application that demostrate the usage of Syn.Speech library for Speech Recognition
Stars: ✭ 24 (-33.33%)
Mutual labels:  speech-recognition, asr

Lightning ASR

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra


What is Lightning ASRInstallationGet StartedDocsCodefactorLicense


Introduction

PyTorch Lightning is the lightweight PyTorch wrapper for high-performance AI research. PyTorch is extremely easy to use to build complex AI models. But once the research gets complicated and things like multi-GPU training, 16-bit precision and TPU training get mixed in, users are likely to introduce bugs. PyTorch Lightning solves exactly this problem. Lightning structures your PyTorch code so it can abstract the details of training. This makes AI research scalable and fast to iterate on.

This project is an example that implements the asr project with PyTorch Lightning. In this project, I trained a model consisting of a conformer encoder + LSTM decoder with Joint CTC-Attention. I hope this could be a guideline for those who research speech recognition.

Installation

This project recommends Python 3.7 or higher.
I recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

  • numpy: pip install numpy (Refer here for problem installing Numpy).
  • pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
  • librosa: conda install -c conda-forge librosa (Refer here for problem installing librosa)
  • torchaudio: pip install torchaudio==0.6.0 (Refer here for problem installing torchaudio)
  • sentencepiece: pip install sentencepiece (Refer here for problem installing sentencepiece)
  • pytorch-lightning: pip install pytorch-lightning (Refer here for problem installing pytorch-lightning)
  • hydra: pip install hydra-core --upgrade (Refer here for problem installing hydra)

Install from source

Currently I only support installation from source code using setuptools. Checkout the source code and run the
following commands:

$ pip install -e .
$ ./setup.sh

Install Apex (for 16-bit training)

For faster training install NVIDIA's apex library:

$ git clone https://github.com/NVIDIA/apex
$ cd apex

# ------------------------
# OPTIONAL: on your cluster you might need to load CUDA 10 or 9
# depending on how you installed PyTorch

# see available modules
module avail

# load correct CUDA before install
module load cuda-10.0
# ------------------------

# make sure you've loaded a cuda version > 4.0 and < 7.0
module load gcc-6.1.0

$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Get Started

I use Hydra to control all the training configurations. If you are not familiar with Hydra I recommend visiting the Hydra website. Generally, Hydra is an open-source framework that simplifies the development of research applications by providing the ability to create a hierarchical configuration dynamically.

Download LibriSpeech dataset

You have to download LibriSpeech dataset that contains 1000h English speech corpus. But you can download simply by dataset_download option. If this option is True, download the dataset and start training. If you already have a dataset, you can set option dataset_download to False and specify dataset_path.

Training Speech Recognizer

You can simply train with LibriSpeech dataset like below:

  • Example1: Train the conformer-lstm model with filter-bank features on GPU.
$ python ./bin/main.py \
    data=default \
    dataset_download=True \
    audio=fbank \
    model=conformer_lstm \
    lr_scheduler=reduce_lr_on_plateau \
    trainer=gpu
  • Example2: Train the conformer-lstm model with mel-spectrogram features On TPU:
$ python ./bin/main.py \
    data=default \
    dataset_download=True \
    audio=melspectrogram \
    model=conformer_lstm \
    lr_scheduler=reduce_lr_on_plateau \
    trainer=tpu

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on Github.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

License

This project is licensed under the MIT LICENSE - see the LICENSE.md file for details

Author

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].