All Projects → sooftware → speech-transformer

sooftware / speech-transformer

Licence: MIT license
Transformer implementation speciaized in speech recognition tasks using Pytorch.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to speech-transformer

End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+337.5%)
Mutual labels:  end-to-end, speech, transformer, asr
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+1312.5%)
Mutual labels:  end-to-end, transformer, asr, attention-is-all-you-need
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+375%)
Mutual labels:  end-to-end, transformer, asr, attention-is-all-you-need
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+1040%)
Mutual labels:  end-to-end, transformer, asr, attention-is-all-you-need
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+920%)
Mutual labels:  speech, transformer, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (+72.5%)
Mutual labels:  speech, transformer, asr
kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-37.5%)
Mutual labels:  end-to-end, transformer, asr
Listen Attend Spell
A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.
Stars: ✭ 147 (+267.5%)
Mutual labels:  end-to-end, asr
Speech Transformer Tf2.0
transformer for ASR-systerm (via tensorflow2.0)
Stars: ✭ 90 (+125%)
Mutual labels:  end-to-end, transformer
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-30%)
Mutual labels:  transformer, attention-is-all-you-need
cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-27.5%)
Mutual labels:  speech, transformer
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (+185%)
Mutual labels:  end-to-end, asr
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (+165%)
Mutual labels:  end-to-end, asr
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+1920%)
Mutual labels:  end-to-end, asr
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+412.5%)
Mutual labels:  speech, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-47.5%)
Mutual labels:  speech, asr
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+347.5%)
Mutual labels:  speech, asr
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-17.5%)
Mutual labels:  speech, transformer
End-to-End-Mandarin-ASR
End-to-end speech recognition on AISHELL dataset.
Stars: ✭ 20 (-50%)
Mutual labels:  end-to-end, asr
Speech Denoising Wavenet
A neural network for end-to-end speech denoising
Stars: ✭ 516 (+1190%)
Mutual labels:  end-to-end, speech

Speech-Transformer

PyTorch implementation of The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition.

Speech Transformer is a transformer framework specialized in speech recognition tasks.
This repository contains only model code, but you can train with speech transformer with this repository.
I appreciate any kind of feedback or contribution

Usage

  • Training
import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 4

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)
input_lengths = torch.IntTensor([100, 50, 8])
targets = torch.LongTensor([[2, 3, 3, 3, 3, 3, 2, 2, 1, 0],
                            [2, 3, 3, 3, 3, 3, 2, 1, 2, 0],
                            [2, 3, 3, 3, 3, 3, 2, 2, 0, 1]]).to(device)  # 1 means <eos_token>
target_lengths = torch.IntTensor([10, 9, 8])

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
predictions, logits = model(inputs, input_lengths, targets, target_lengths)
  • Beam Search Decoding
import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 10

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)  # BxTxD
input_lengths = torch.LongTensor([SEQ_LENGTH, SEQ_LENGTH - 10, SEQ_LENGTH - 20]).to(device)

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
model.set_beam_decoder(batch_size=BATCH_SIZE, beam_size=3)
predictions, _ = model(inputs, input_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].