All Projects → loretoparisi → wave2vec-recognize-docker

loretoparisi / wave2vec-recognize-docker

Licence: MIT license
Wave2vec 2.0 Recognize pipeline

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to wave2vec-recognize-docker

sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+310%)
Mutual labels:  automatic-speech-recognition, asr, wav2letter
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+1080%)
Mutual labels:  automatic-speech-recognition, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-30%)
Mutual labels:  automatic-speech-recognition, asr
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 2,384 (+7846.67%)
Mutual labels:  automatic-speech-recognition, asr
Speech-Recognition
End-to-End Speech Recognition using Neural Networks.
Stars: ✭ 31 (+3.33%)
Mutual labels:  automatic-speech-recognition, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-26.67%)
Mutual labels:  automatic-speech-recognition, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+583.33%)
Mutual labels:  asr, wav2vec
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (+76.67%)
Mutual labels:  asr
TargomanSMT
Targoman SMT framework source code
Stars: ✭ 29 (-3.33%)
Mutual labels:  kenlm
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+496.67%)
Mutual labels:  asr
asr24
24-hour Automatic Speech Recognition
Stars: ✭ 27 (-10%)
Mutual labels:  asr
FAST-RIR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Stars: ✭ 90 (+200%)
Mutual labels:  automatic-speech-recognition
ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (+273.33%)
Mutual labels:  asr
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+276.67%)
Mutual labels:  asr
KoLM
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
Stars: ✭ 46 (+53.33%)
Mutual labels:  asr
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+1420%)
Mutual labels:  asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-33.33%)
Mutual labels:  asr
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (+23.33%)
Mutual labels:  automatic-speech-recognition
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-16.67%)
Mutual labels:  asr
AESRC2020
Data preperation scripts, training pipeline and baseline experiment results for the Interspeech 2020 Accented English Speech Recognition Challenge (AESRC).
Stars: ✭ 40 (+33.33%)
Mutual labels:  asr

wav2vec

wav2vec 2.0 Recognize Implementation.

Disclaimer

Wave2vec is part of fairseq This repository is the result of the issue submitted in the fairseq repository here.

Resource

Please first download one of the pre-trained models available from fairseq (see later).

Pre-trained models

Model Finetuning split Dataset Model
Wav2Vec 2.0 Base No finetuning Librispeech download
Wav2Vec 2.0 Base 10 minutes Librispeech download
Wav2Vec 2.0 Base 100 hours Librispeech download
Wav2Vec 2.0 Base 960 hours Librispeech download
Wav2Vec 2.0 Large No finetuning Librispeech download
Wav2Vec 2.0 Large 10 minutes Librispeech download
Wav2Vec 2.0 Large 100 hours Librispeech download
Wav2Vec 2.0 Large 960 hours Librispeech download
Wav2Vec 2.0 Large (LV-60) No finetuning Libri-Light download
Wav2Vec 2.0 Large (LV-60) 10 minutes Libri-Light + Librispeech download
Wav2Vec 2.0 Large (LV-60) 100 hours Libri-Light + Librispeech download
Wav2Vec 2.0 Large (LV-60) 960 hours Libri-Light + Librispeech download

How to install

We make use of python:3.8.6-slim-buster as base image in order to let developers to have more flexibility in customize this Dockerfile. For a simplifed install please refer to Alternative Install section. If you go for this container, please install using the provided Dockerfile

docker build -t wav2vec -f Dockerfile .

How to Run

There are two version of recognize.py.

  • recognize.py: For running legacy finetuned model (without Hydra).
  • recognize.hydra.py: For running new finetuned with newer version of fairseq.

Before running, please copy the downloaded model (e.g. wav2vec_small_10m.pt) to the data/ folder. Please copy there the wav file to test as well, like data/temp.wav in the following examples. So the data/ folder will now look like this

.
├── dict.ltr.txt
├── temp.wav
└── wav2vec_small_10m.pt

We now run the container and the we enter and execute the recognition (recognize.py or recognize.hydra.py).

docker run -d -it --rm -v $PWD/data:/app/data --name w2v wav2vec
docker exec -it w2v bash
python examples/wav2vec/recognize.py --target_dict_path=/app/data/dict.ltr.txt /app/data/wav2vec_small_10m.pt /app/data/temp.wav

Common issues

1. What if my model are not compatible with fairseq?

At the very least, we have tested with fairseq master branch (> v0.10.1, commit ac11107). When you run into issues, like this:

omegaconf.errors.ValidationError: Invalid value 'False', expected one of [hard, soft]
full_key: generation.print_alignment
reference_type=GenerationConfig
object_type=GenerationConfig

It's probably that your model've been finetuned (or trained) with other version of fairseq. You should find yourself which version your model are trained, and edit commit hash in Dockerfile accordingly, BUT IT MIGHT BREAK src/recognize.py.

The workaround is look for what's changed in the parameters inside fairseq source code. In the above example, I've managed to find that:

fairseq/dataclass/configs.py (72a25a4 -> 032a404)

- print_alignment: bool = field(
+ print_alignment: Optional[PRINT_ALIGNMENT_CHOICES] = field(
-     default=False,
+     default=None,
      metadata={
-         "help": "if set, uses attention feedback to compute and print alignment to source tokens"
+         "help": "if set, uses attention feedback to compute and print alignment to source tokens "
+         "(valid options are: hard, soft, otherwise treated as hard alignment)",
+         "argparse_const": "hard",
      },
  )

The problem is fairseq had modified such that generation.print_alignment not valid anymore, so I modify recognize.hydra.py as below (you might wanna modify the value instead):

  OmegaConf.set_struct(w2v["cfg"], False)
+ del w2v["cfg"].generation["print_alignment"]
  cfg = OmegaConf.merge(OmegaConf.structured(Wav2Vec2CheckpointConfig), w2v["cfg"])

Alternative install

We provide an alternative Dockerfile named wav2letter.Dockerfile that makes use of wav2letter/wav2letter:cpu-latest Docker image as FROM. Here are the commands for build, install and run in this case:

docker build -t wav2vec2 -f wav2letter.Dockerfile .
docker run -d -it --rm -v $PWD/data:/root/data --name w2v2 wav2vec2
docker exec -it w2v2 bash
python examples/wav2vec/recognize.py --wav_path /root/data/temp.wav --w2v_path /root/data/wav2vec_small_10m.pt --target_dict_path /root/data/dict.ltr.txt 

Contributors

Thanks to all contributors to this repo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].