All Projects → noahchalifour → Rnnt Speech Recognition

noahchalifour / Rnnt Speech Recognition

Licence: mit
End-to-end speech recognition using RNN Transducers in Tensorflow 2.0

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rnnt Speech Recognition

Patter
speech-to-text in pytorch
Stars: ✭ 71 (-55.06%)
Mutual labels:  speech-recognition, rnn
Text predictor
Char-level RNN LSTM text generator📄.
Stars: ✭ 99 (-37.34%)
Mutual labels:  artificial-intelligence, rnn
Gru Svm
[ICMLC 2018] A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection
Stars: ✭ 76 (-51.9%)
Mutual labels:  artificial-intelligence, rnn
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+383.54%)
Mutual labels:  artificial-intelligence, speech-recognition
Persephone
A tool for automatic phoneme transcription
Stars: ✭ 130 (-17.72%)
Mutual labels:  artificial-intelligence, speech-recognition
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-70.25%)
Mutual labels:  artificial-intelligence, speech-recognition
Speech Emotion Recognition
Detecting emotions using MFCC features of human speech using Deep Learning
Stars: ✭ 89 (-43.67%)
Mutual labels:  speech-recognition, rnn
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+1641.14%)
Mutual labels:  speech-recognition, rnn
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-25.32%)
Mutual labels:  artificial-intelligence, speech-recognition
Ml Ai Experiments
All my experiments with AI and ML
Stars: ✭ 107 (-32.28%)
Mutual labels:  artificial-intelligence, rnn
Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (+127.22%)
Mutual labels:  artificial-intelligence, rnn
Awesome Ai Services
An overview of the AI-as-a-service landscape
Stars: ✭ 133 (-15.82%)
Mutual labels:  artificial-intelligence, speech-recognition
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (-86.71%)
Mutual labels:  speech-recognition, rnn
Dragonfire
the open-source virtual assistant for Ubuntu based Linux distributions
Stars: ✭ 1,120 (+608.86%)
Mutual labels:  artificial-intelligence, speech-recognition
Deep Learning With Python
Deep learning codes and projects using Python
Stars: ✭ 195 (+23.42%)
Mutual labels:  artificial-intelligence, rnn
Laibot Client
开源人工智能,基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互,来宝拥有无限可能。特别说明,来宝运行于Python 3!
Stars: ✭ 81 (-48.73%)
Mutual labels:  artificial-intelligence, speech-recognition
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1227.22%)
Mutual labels:  speech-recognition, rnn
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (+39.24%)
Mutual labels:  speech-recognition, rnn
Ios ml
List of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.
Stars: ✭ 1,409 (+791.77%)
Mutual labels:  artificial-intelligence, speech-recognition
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (-16.46%)
Mutual labels:  artificial-intelligence, speech-recognition

RNN-Transducer Speech Recognition

End-to-end speech recognition using RNN-Transducer in Tensorflow 2.0

Overview

This speech recognition model is based off Google's Streaming End-to-end Speech Recognition For Mobile Devices research paper and is implemented in Python 3 using Tensorflow 2.0

Setup Your Environment

To setup your environment, run the following command:

git clone --recurse https://github.com/noahchalifour/rnnt-speech-recognition.git
cd rnnt-speech-recognition
pip install tensorflow==2.2.0 # or tensorflow-gpu==2.2.0 for GPU support
pip install -r requirements.txt
./scripts/build_rnnt.sh # to setup the rnnt loss

Common Voice

You can find and download the Common Voice dataset here

Convert all MP3s to WAVs

Before you can train a model on the Common Voice dataset, you must first convert all the audio mp3 filetypes to wavs. Do so by running the following command:

NOTE: Make sure you have ffmpeg installed on your computer, as it uses that to convert mp3 to wav

./scripts/common_voice_convert.sh <data_dir> <# of threads>
python scripts/remove_missing_samples.py \
    --data_dir <data_dir> \
    --replace_old

Preprocessing dataset

After converting all the mp3s to wavs you need to preprocess the dataset, you can do so by running the following command:

python preprocess_common_voice.py \
    --data_dir <data_dir> \
    --output_dir <preprocessed_dir>

Training a model

To train a simple model, run the following command:

python run_rnnt.py \
    --mode train \
    --data_dir <path to data directory>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].