All Projects → oliverguhr → wav2vec2-live

oliverguhr / wav2vec2-live

Licence: MIT license
A live speech recognition using Facebooks wav2vec 2.0 model.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to wav2vec2-live

Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (-37.56%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-66.34%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-40%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-72.2%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+1051.71%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-12.68%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+0%)
Mutual labels:  speech, speech-recognition, speech-to-text, asr
Deepspeech
A PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+494.63%)
Mutual labels:  speech, speech-recognition, speech-to-text
Tacotron asr
Speech Recognition Using Tacotron
Stars: ✭ 165 (-19.51%)
Mutual labels:  speech, speech-recognition, speech-to-text
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+922.93%)
Mutual labels:  speech, speech-recognition, asr
Kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+5339.51%)
Mutual labels:  speech, speech-recognition, speech-to-text
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+268.78%)
Mutual labels:  speech, speech-recognition, asr
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+621.46%)
Mutual labels:  speech, speech-recognition, asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (-14.63%)
Mutual labels:  speech, speech-recognition, asr
Discordspeechbot
A speech-to-text bot for discord with music commands and more using NodeJS. Ideally for controlling your Discord server using voice commands, can also be useful for hearing-impaired people.
Stars: ✭ 35 (-82.93%)
Mutual labels:  speech, speech-recognition, speech-to-text
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+18.05%)
Mutual labels:  speech, speech-recognition, speech-to-text
Kerasdeepspeech
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Stars: ✭ 245 (+19.51%)
Mutual labels:  speech, speech-to-text, asr
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-89.76%)
Mutual labels:  speech-recognition, speech-to-text, asr
Sonus
💬 /so.nus/ STT (speech to text) for Node with offline hotword detection
Stars: ✭ 532 (+159.51%)
Mutual labels:  speech, speech-recognition, speech-to-text
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+2932.2%)
Mutual labels:  speech, speech-recognition, speech-to-text

automatic speech recognition with wav2vec2

Use any wav2vec model with a microphone.

demo gif

Setup

I recommend to install this project in a virtual environment.

python3 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt

Depending on linux distribution you might encounter an error that portaudio was not found when installing pyaudio. For Ubuntu you can solve that issue by installing the "portaudio19-dev" package.

sudo apt install portaudio19-dev

Finally you can test the speech recognition:

python live_asr.py

Possible Issues:

  • The code uses the systems default audio device. Please make sure that you set your systems default audio device correctly.

  • "attempt to connect to server failed" you can safely ignore this message from pyaudio. It just means, that pyaudio can't connect to the jack audio server.

Usage

You can use any wav2vec2 model from the huggingface model hub. Just set the model name, all files will be downloaded on first execution.

from live_asr import LiveWav2Vec2

english_model = "facebook/wav2vec2-large-960h-lv60-self"
german_model = "maxidl/wav2vec2-large-xlsr-german"
asr = LiveWav2Vec2(german_model,device_name="default")
asr.start()

try:        
    while True:
        text,sample_length,inference_time = asr.get_last_text()                        
        print(f"{sample_length:.3f}s"
        +f"\t{inference_time:.3f}s"
        +f"\t{text}")
        
except KeyboardInterrupt:   
    asr.stop()  
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].