All Projects → hirofumi0810 → Tensorflow_end2end_speech_recognition

hirofumi0810 / Tensorflow_end2end_speech_recognition

Licence: mit
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tensorflow end2end speech recognition

Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+141.97%)
Mutual labels:  speech-recognition, speech-to-text, asr, ctc
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+33.77%)
Mutual labels:  speech-recognition, attention-mechanism, asr, ctc
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-91.8%)
Mutual labels:  speech-recognition, speech-to-text, asr
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+49.51%)
Mutual labels:  end-to-end, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-93.44%)
Mutual labels:  end-to-end, speech-recognition, asr
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-91.8%)
Mutual labels:  end-to-end, speech-recognition, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-41.31%)
Mutual labels:  speech-recognition, speech-to-text, asr
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (-80.33%)
Mutual labels:  speech-recognition, beam-search, speech-to-text
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-93.11%)
Mutual labels:  speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-59.67%)
Mutual labels:  speech-recognition, speech-to-text, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-93.11%)
Mutual labels:  speech-recognition, speech-to-text, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-91.15%)
Mutual labels:  speech-recognition, speech-to-text, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-93.44%)
Mutual labels:  speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (-32.79%)
Mutual labels:  speech-recognition, speech-to-text, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-82.62%)
Mutual labels:  speech-recognition, speech-to-text, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+16.07%)
Mutual labels:  speech-recognition, speech-to-text, asr
ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (-63.28%)
Mutual labels:  speech-recognition, asr, ctc
Rus-SpeechRecognition-LSTM-CTC-VoxForge
Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge
Stars: ✭ 50 (-83.61%)
Mutual labels:  end-to-end, speech-recognition, ctc
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (-27.87%)
Mutual labels:  speech-recognition, speech-to-text, ctc
Ctcdecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Stars: ✭ 529 (+73.44%)
Mutual labels:  beam-search, speech-recognition, ctc

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

  • TensorFlow >= 1.3.0
  • tqdm >= 4.14.0
  • python-Levenshtein >= 0.12.0
  • setproctitle >= 1.1.10
  • seaborn >= 0.7.1

Corpus

TIMIT

  • Phone (39, 48, 61 phones)
  • character

LibriSpeech

  • Phone (under implementation)
  • Character
  • Word

CSJ (Corpus of Spontaneous Japanese)

  • Phone (under implementation)
  • Japanese kana character (about 150 classes)
  • Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

  • Switchboard
  • WSJ
  • AMI

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Encoder

  • BLSTM
  • LSTM
  • BGRU
  • GRU
  • VGG-BLSTM
  • VGG-LSTM
  • Multi-task BLSTM
    • you can set another CTC layer to the aubitrary layer.
  • Multi-task LSTM
  • VGG

Connectionist Temporal Classification (CTC) [Graves+ 2006]

  • Greedy decoder
  • Beam Search decoder
  • Beam Search decoder w/ CharLM (under implementation)
Options
  • Frame-stacking [Sak+ 2015]
  • Multi-GPUs training (synchronous)
  • Splicing
  • Down sampling (under implementation)

Attention Mechanism

Decoder
  • Greedy decoder
  • Beam search decoder (under implementation)
Attention type
  • Bahdanau's content-based attention
  • Bahdanau's normed content-based attention (under implementation)
  • location-based attention
  • Hybrid attention
  • Luong's dot attention
  • Luong's scaled dot attention (under implementation)
  • Luong's general attention
  • Luong's concat attention
  • Baidu's attention (under implementation)
Options
  • Sharpning
  • Temperature regularization in the softmax layer (Output posteriors)
  • Joint CTC-Attention [Kim 2016]
  • Coverage (under implementation)

Usage

Please refer to docs in each corpuse

  • TIMIT
  • LibriSpeech
  • CSJ

Lisense

MIT

Contact

[email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].