Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sooftware → speech-transformer

sooftware / speech-transformer

Licence: MIT license

Transformer implementation speciaized in speech recognition tasks using Pytorch.

Programming Languages

139335 projects - #7 most used programming language

Labels

end-to-end speech pytorch transformer asr attention-is-all-you-need

Projects that are alternatives of or similar to speech-transformer

End2end Asr Pytorch

End-to-End Automatic Speech Recognition on PyTorch

Stars: ✭ 175 (+337.5%)

Mutual labels: end-to-end, speech, transformer, asr

Speech Transformer

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

Stars: ✭ 565 (+1312.5%)

Mutual labels: end-to-end, transformer, asr, attention-is-all-you-need

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.

Stars: ✭ 190 (+375%)

Mutual labels: end-to-end, transformer, asr, attention-is-all-you-need

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Stars: ✭ 456 (+1040%)

Mutual labels: end-to-end, transformer, asr, attention-is-all-you-need

End-to-end ASR/LM implementation with PyTorch

Stars: ✭ 408 (+920%)

Mutual labels: speech, transformer, asr

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (+72.5%)

Mutual labels: speech, transformer, asr

Korean speech recognition based on transformer (트랜스포머 기반 한국어 음성 인식)

Stars: ✭ 25 (-37.5%)

Mutual labels: end-to-end, transformer, asr

Listen Attend Spell

A PyTorch implementation of Listen, Attend and Spell (LAS), an End-to-End ASR framework.

Stars: ✭ 147 (+267.5%)

Mutual labels: end-to-end, asr

Speech Transformer Tf2.0

transformer for ASR-systerm (via tensorflow2.0)

Stars: ✭ 90 (+125%)

Mutual labels: end-to-end, transformer

A PyTorch Implementation of "Attention Is All You Need"

Stars: ✭ 28 (-30%)

Mutual labels: transformer, attention-is-all-you-need

Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Stars: ✭ 29 (-27.5%)

Mutual labels: speech, transformer

MXNet implementation of RNN Transducer (Graves 2012): Sequence Transduction with Recurrent Neural Networks

Stars: ✭ 114 (+185%)

Mutual labels: end-to-end, asr

PyTorch Implementations for End-to-End Automatic Speech Recognition

Stars: ✭ 106 (+165%)

Mutual labels: end-to-end, asr

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Stars: ✭ 808 (+1920%)

Mutual labels: end-to-end, asr

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (+412.5%)

Mutual labels: speech, asr

opensource-voice-tools

A repo listing known open source voice tools, ordered by where they sit in the voice stack

Stars: ✭ 21 (-47.5%)

Mutual labels: speech, asr

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (+347.5%)

Mutual labels: speech, asr

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Stars: ✭ 33 (-17.5%)

Mutual labels: speech, transformer

End-to-End-Mandarin-ASR

End-to-end speech recognition on AISHELL dataset.

Stars: ✭ 20 (-50%)

Mutual labels: end-to-end, asr

Speech Denoising Wavenet

A neural network for end-to-end speech denoising

Stars: ✭ 516 (+1190%)

Mutual labels: end-to-end, speech

View All Similar Projects ➔

Speech-Transformer

PyTorch implementation of The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition.

Speech Transformer is a transformer framework specialized in speech recognition tasks.
This repository contains only model code, but you can train with speech transformer with this repository.
I appreciate any kind of feedback or contribution

Usage

Training

import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 4

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)
input_lengths = torch.IntTensor([100, 50, 8])
targets = torch.LongTensor([[2, 3, 3, 3, 3, 3, 2, 2, 1, 0],
                            [2, 3, 3, 3, 3, 3, 2, 1, 2, 0],
                            [2, 3, 3, 3, 3, 3, 2, 2, 0, 1]]).to(device)  # 1 means <eos_token>
target_lengths = torch.IntTensor([10, 9, 8])

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
predictions, logits = model(inputs, input_lengths, targets, target_lengths)

Beam Search Decoding

import torch
from speech_transformer import SpeechTransformer

BATCH_SIZE, SEQ_LENGTH, DIM, NUM_CLASSES = 3, 12345, 80, 10

cuda = torch.cuda.is_available()
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(BATCH_SIZE, SEQ_LENGTH, DIM).to(device)  # BxTxD
input_lengths = torch.LongTensor([SEQ_LENGTH, SEQ_LENGTH - 10, SEQ_LENGTH - 20]).to(device)

model = SpeechTransformer(num_classes=NUM_CLASSES, d_model=512, num_heads=8, input_dim=DIM)
model.set_beam_decoder(batch_size=BATCH_SIZE, beam_size=3)
predictions, _ = model(inputs, input_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

Soohwan Kim @sooftware
Contacts: [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 40

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗