All Projects → LearnedVector → Wav2letter

LearnedVector / Wav2letter

Speech Recognition model based off of FAIR research paper built using Pytorch.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Wav2letter

Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+879.49%)
Mutual labels:  convolutional-neural-networks, neural-networks, speech-recognition, asr
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (+24.36%)
Mutual labels:  convolutional-neural-networks, speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+57.69%)
Mutual labels:  speech-recognition, speech-to-text, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (-71.79%)
Mutual labels:  speech-recognition, speech-to-text, asr
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-39.74%)
Mutual labels:  convolutional-neural-networks, speech-recognition, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-65.38%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (-33.33%)
Mutual labels:  speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-26.92%)
Mutual labels:  speech-recognition, speech-to-text, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-32.05%)
Mutual labels:  speech-recognition, speech-to-text, asr
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+569.23%)
Mutual labels:  speech-recognition, speech-to-text, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+391.03%)
Mutual labels:  speech-recognition, speech-to-text, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+846.15%)
Mutual labels:  speech-recognition, speech-to-text, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-73.08%)
Mutual labels:  speech-recognition, speech-to-text, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-73.08%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-74.36%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-67.95%)
Mutual labels:  speech-recognition, speech-to-text, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+291.03%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+162.82%)
Mutual labels:  speech-recognition, speech-to-text, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+129.49%)
Mutual labels:  speech-recognition, speech-to-text, asr
Deepspeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Stars: ✭ 18,680 (+23848.72%)
Mutual labels:  neural-networks, speech-recognition, speech-to-text

Wav2Letter Speech Recognition with Pytorch

A Simple, straight forward, easy to read implementation of Wav2Letter, a speech recognition model from Facebooks AI Research (FAIR) paper. You can see most of the architecture in the Wav2Letter directory.

The next iteration of Wav2Letter can be found in this paper. This paper uses Gated Convnets instead of normal Convnets.

The Google Speech Command Example.ipynb notebook contains an example of this implementation.

Precise 2 Diagram

Differences

  • Uses CTC Loss
  • Uses Greedy Decoder

TODO

  • Implement Train, Validation, Test sets
  • Test on larger speech data
  • Implement AutoSegCriterion
  • Implement Beam Search Decoder
  • Use KenLM Langauge Model in Decoder
  • Use larger datasets
  • Add Gated ConvNets

Getting Started

Requirements

pip install -r requirements.txt

Make sure you are using pytorch-nightly (version 1.0 alpha). This has the CTC_Loss loss function we need.

Smoke Test

smoke_test.py contains a quick test to see if everything is working

python smoke_test.py

This will train a model on randomly generated inputs and target generated data. If everyhing is working correctly, expect to see outputs of the predicted and target labels. Of course expect the outputs to be garbage.

Data

For an initial test, I used the Google Speech Command Dataset. This is a simple to use lightweight dataset for testing model performance.

Instructions to download data

  1. Download the dataset.
  2. Create a ./speech_data directory at root of this project.
  3. Unzip the google speech data. Should be named speech_commands_v0.01.

Prepare data

data.py contains scripts to process google speech command audio data into features compatible with Wav2Letter.

python Wav2Letter/data.py

This will process the google speech commands audio data into 13 mfcc features with a max framelength of 250 (these are short audio clips). Anything less will be padded with zeros. Target data will be integer encoded and also padded to have the same length. Final outputs are numpy arrays saved as x.npy and y.npy in the ./speech_data directory.

Train

train.py has the code to run the training. Example would be.

python train.py --batch_size=256 --epochs=1000

Contributions

Pull Requests are accepted! I would love some help to knock out the Todo's. Email me at [email protected] for any questions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].