All Projects → upskyy → Transformer-Transducer

upskyy / Transformer-Transducer

Licence: Apache-2.0 license
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Transformer-Transducer

kosr
Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)
Stars: ✭ 25 (-59.02%)
Mutual labels:  end-to-end, transformer, speech-recognition, transformer-transducer
Speech Transformer Tf2.0
transformer for ASR-systerm (via tensorflow2.0)
Stars: ✭ 90 (+47.54%)
Mutual labels:  end-to-end, transformer, speech-recognition
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+568.85%)
Mutual labels:  transformer, speech-recognition, sequence-to-sequence
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+186.89%)
Mutual labels:  end-to-end, transformer, speech-recognition
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+211.48%)
Mutual labels:  end-to-end, transformer, speech-recognition
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+788.52%)
Mutual labels:  transformer, speech-recognition, sequence-to-sequence
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+647.54%)
Mutual labels:  end-to-end, transformer, speech-recognition
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+400%)
Mutual labels:  end-to-end, speech-recognition
Espnet
End-to-End Speech Processing Toolkit
Stars: ✭ 4,533 (+7331.15%)
Mutual labels:  end-to-end, speech-recognition
Wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 5,907 (+9583.61%)
Mutual labels:  end-to-end, speech-recognition
E2e Asr
PyTorch Implementations for End-to-End Automatic Speech Recognition
Stars: ✭ 106 (+73.77%)
Mutual labels:  end-to-end, speech-recognition
Rus-SpeechRecognition-LSTM-CTC-VoxForge
Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge
Stars: ✭ 50 (-18.03%)
Mutual labels:  end-to-end, speech-recognition
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+1224.59%)
Mutual labels:  end-to-end, speech-recognition
Rnn Transducer
MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks
Stars: ✭ 114 (+86.89%)
Mutual labels:  end-to-end, speech-recognition
SOLQ
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.
Stars: ✭ 159 (+160.66%)
Mutual labels:  end-to-end, transformer
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+826.23%)
Mutual labels:  end-to-end, transformer
wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Stars: ✭ 6,026 (+9778.69%)
Mutual labels:  end-to-end, speech-recognition
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+4409.84%)
Mutual labels:  end-to-end, speech-recognition
seq2seq-pytorch
Sequence to Sequence Models in PyTorch
Stars: ✭ 41 (-32.79%)
Mutual labels:  transformer, sequence-to-sequence
speech-transformer
Transformer implementation speciaized in speech recognition tasks using Pytorch.
Stars: ✭ 40 (-34.43%)
Mutual labels:  end-to-end, transformer

Transformer-Transducer

Transformer-Transducer is that every layer is identical for both audio and label encoders. Unlike the basic transformer structure, the audio encoder and label encoder are separate. So, the alignment is handled by a separate forward-backward process within the RNN-T architecture. And this replace the LSTM encoders with Transformer encoders in RNN-T architecture.

This repository contains only model code, but you can train with transformer transducer at openspeech.

Installation

pip install -e .   

Usage

from transformer_transducer.model_builder import build_transformer_transducer
import torch

BATCH_SIZE, SEQ_LENGTH, INPUT_SIZE, NUM_VOCABS = 3, 500, 80, 10

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

model = build_transformer_transducer(
        device,
        num_vocabs=NUM_VOCABS,
        input_size=INPUT_SIZE,
)

inputs = torch.FloatTensor(BATCH_SIZE, INPUT_SIZE, SEQ_LENGTH).to(device)
input_lengths = torch.IntTensor([500, 450, 350])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

# Forward propagate
outputs = model(inputs, input_lengths, targets, target_lengths)

# Recognize input speech
outputs = model.recognize(inputs, input_lengths)

Reference

License

Copyright 2021 Sangchun Ha.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].