All Projects → thomasschmied → Speech_recognition_with_tensorflow

thomasschmied / Speech_recognition_with_tensorflow

Licence: mit
Implementation of a seq2seq model for Speech Recognition using the latest version of TensorFlow. Architecture similar to Listen, Attend and Spell.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Speech recognition with tensorflow

Openseq2seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Stars: ✭ 1,378 (+444.66%)
Mutual labels:  speech-recognition, seq2seq, speech-to-text, sequence-to-sequence
Text summarization with tensorflow
Implementation of a seq2seq model for summarization of textual data. Demonstrated on amazon reviews, github issues and news articles.
Stars: ✭ 226 (-10.67%)
Mutual labels:  jupyter-notebook, seq2seq, sequence-to-sequence, encoder-decoder
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+1250.99%)
Mutual labels:  jupyter-notebook, seq2seq, sequence-to-sequence, encoder-decoder
Tf Seq2seq
Sequence to sequence learning using TensorFlow.
Stars: ✭ 387 (+52.96%)
Mutual labels:  seq2seq, sequence-to-sequence, encoder-decoder
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+61.26%)
Mutual labels:  speech-recognition, seq2seq, sequence-to-sequence
Lingvo
Lingvo
Stars: ✭ 2,361 (+833.2%)
Mutual labels:  speech-recognition, seq2seq, speech-to-text
Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (+41.9%)
Mutual labels:  jupyter-notebook, seq2seq, encoder-decoder
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+106.32%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (+54.94%)
Mutual labels:  jupyter-notebook, speech-recognition, seq2seq
Screenshot To Code
A neural network that transforms a design mock-up into a static website.
Stars: ✭ 13,561 (+5260.08%)
Mutual labels:  jupyter-notebook, seq2seq, encoder-decoder
Nemo
NeMo: a toolkit for conversational AI
Stars: ✭ 3,685 (+1356.52%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+291.3%)
Mutual labels:  seq2seq, sequence-to-sequence, encoder-decoder
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+484.58%)
Mutual labels:  speech-recognition, seq2seq, sequence-to-sequence
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (-36.36%)
Mutual labels:  jupyter-notebook, speech-recognition, speech-to-text
Voice Overlay Android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 189 (-25.3%)
Mutual labels:  speech-recognition, speech-to-text
Tensorflow Speech Recognition
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
Stars: ✭ 2,118 (+737.15%)
Mutual labels:  speech-recognition, speech-to-text
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (-24.9%)
Mutual labels:  speech-recognition, seq2seq
Vosk
VOSK Speech Recognition Toolkit
Stars: ✭ 182 (-28.06%)
Mutual labels:  speech-recognition, speech-to-text
Automatic Speech Recognition
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Stars: ✭ 192 (-24.11%)
Mutual labels:  speech-recognition, speech-to-text
Dictate.js
A small Javascript library for browser-based real-time speech recognition, which uses Recorderjs for audio capture, and a WebSocket connection to the Kaldi GStreamer server for speech recognition.
Stars: ✭ 195 (-22.92%)
Mutual labels:  speech-recognition, speech-to-text

Speech_Recognition_with_Tensorflow

Implementation of a seq2seq model for speech recognition. Architecture similar to "Listen, Attend and Spell". https://arxiv.org/pdf/1508.01211.pdf

alt text

Created: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']
Actual: ['S', 'E', 'V', 'E', 'N', 'T', 'E', 'E', 'N', '<SPACE>', 'T', 'W', 'E', 'N', 'T', 'Y', '<SPACE>', 'F', 'O', 'U', 'R']

Prerequisites

  • Tensorflow
  • numpy
  • pandas
  • librosa
  • python_speech_features

Datasets

The dataset I used is the LibriSpeech dataset. It contains about 1000 hours of 16kHz read English speech. It is available here: http://www.openslr.org/12/

Code

I uploaded three .py files and one .ipynb file. The .py files contain the network implementation and utilities. The Jupyter Notebook is a demo of how to apply the model.

Architecture

Seq2Seq model
As I mentioned above the model architecture is similar to the one used in "Listen, Attend and Spell", i.e. we are using pyramidal bidirectional LSTMs in the encoder. This reduces the time resolution and enhances the performance on longer sequences.

  • Encoder-Decoder
  • Pyramidal Bidirectional LSTM
  • Bahdanau Attention
  • Adam Optimizer
  • exponential or cyclic learning rate
  • Beam Search or Greedy Decoding
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].