hirofumi0810 / Tensorflow_end2end_speech_recognition
Licence: mit
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Tensorflow end2end speech recognition
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+141.97%)
Mutual labels: speech-recognition, speech-to-text, asr, ctc
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+33.77%)
Mutual labels: speech-recognition, attention-mechanism, asr, ctc
speech-recognition-evaluation
Evaluate results from ASR/Speech-to-Text quickly
Stars: ✭ 25 (-91.8%)
Mutual labels: speech-recognition, speech-to-text, asr
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+49.51%)
Mutual labels: end-to-end, speech-recognition, asr
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-93.44%)
Mutual labels: end-to-end, speech-recognition, asr
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-91.8%)
Mutual labels: end-to-end, speech-recognition, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (-41.31%)
Mutual labels: speech-recognition, speech-to-text, asr
rnnt decoder cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Stars: ✭ 60 (-80.33%)
Mutual labels: speech-recognition, beam-search, speech-to-text
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-93.11%)
Mutual labels: speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (-59.67%)
Mutual labels: speech-recognition, speech-to-text, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (-93.11%)
Mutual labels: speech-recognition, speech-to-text, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-91.15%)
Mutual labels: speech-recognition, speech-to-text, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-93.44%)
Mutual labels: speech-recognition, speech-to-text, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (-32.79%)
Mutual labels: speech-recognition, speech-to-text, asr
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-82.62%)
Mutual labels: speech-recognition, speech-to-text, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+16.07%)
Mutual labels: speech-recognition, speech-to-text, asr
ctc-asr
End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (-63.28%)
Mutual labels: speech-recognition, asr, ctc
Rus-SpeechRecognition-LSTM-CTC-VoxForge
Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge
Stars: ✭ 50 (-83.61%)
Mutual labels: end-to-end, speech-recognition, ctc
Rnn ctc
Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.
Stars: ✭ 220 (-27.87%)
Mutual labels: speech-recognition, speech-to-text, ctc
Ctcdecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, prefix search, beam search and token passing. Implemented in Python.
Stars: ✭ 529 (+73.44%)
Mutual labels: beam-search, speech-recognition, ctc
TensorFlow Implementation of End-to-End Speech Recognition
Requirements
- TensorFlow >= 1.3.0
- tqdm >= 4.14.0
- python-Levenshtein >= 0.12.0
- setproctitle >= 1.1.10
- seaborn >= 0.7.1
Corpus
TIMIT
- Phone (39, 48, 61 phones)
- character
LibriSpeech
- Phone (under implementation)
- Character
- Word
CSJ (Corpus of Spontaneous Japanese)
- Phone (under implementation)
- Japanese kana character (about 150 classes)
- Japanese kanji characters (about 3000 classes)
These corpuses will be added in the future.
- Switchboard
- WSJ
- AMI
This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.
Model
Encoder
- BLSTM
- LSTM
- BGRU
- GRU
- VGG-BLSTM
- VGG-LSTM
- Multi-task BLSTM
- you can set another CTC layer to the aubitrary layer.
- Multi-task LSTM
- VGG
[Graves+ 2006]
Connectionist Temporal Classification (CTC)- Greedy decoder
- Beam Search decoder
- Beam Search decoder w/ CharLM (under implementation)
Options
- Frame-stacking [Sak+ 2015]
- Multi-GPUs training (synchronous)
- Splicing
- Down sampling (under implementation)
Attention Mechanism
Decoder
- Greedy decoder
- Beam search decoder (under implementation)
Attention type
- Bahdanau's content-based attention
- Bahdanau's normed content-based attention (under implementation)
- location-based attention
- Hybrid attention
- Luong's dot attention
- Luong's scaled dot attention (under implementation)
- Luong's general attention
- Luong's concat attention
- Baidu's attention (under implementation)
Options
- Sharpning
- Temperature regularization in the softmax layer (Output posteriors)
- Joint CTC-Attention [Kim 2016]
- Coverage (under implementation)
Usage
Please refer to docs in each corpuse
- TIMIT
- LibriSpeech
- CSJ
Lisense
MIT
Contact
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].