KerasdeepspeechA Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Stars: ✭ 245 (+512.5%)
torch-asgAuto Segmentation Criterion (ASG) implemented in pytorch
Stars: ✭ 42 (+5%)
Neural spEnd-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+920%)
Multimodal-Gesture-Recognition-with-LSTMs-and-CTCAn end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.
Stars: ✭ 25 (-37.5%)
CRNN.tf2Convolutional Recurrent Neural Network(CRNN) for End-to-End Text Recognition - TensorFlow 2
Stars: ✭ 131 (+227.5%)
Zero-Shot-TTSUnofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-17.5%)
ShifterPitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-45%)
nlp-classA Natural Language Processing course taught by Professor Ghassemi
Stars: ✭ 95 (+137.5%)
VAD-LTSDEfficient voice activity detection algorithm using long-term speech information
Stars: ✭ 37 (-7.5%)
room-impulse-responsesA list of publicly available room impulse response datasets and scripts to download them.
Stars: ✭ 143 (+257.5%)
AdaSpeechAdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+170%)
gtranscribeSoftware for interview transcription
Stars: ✭ 12 (-70%)
PhomemeSimple sentence mixing tool (work in progress)
Stars: ✭ 18 (-55%)
opensnipsOpen source projects related to Snips https://snips.ai/.
Stars: ✭ 50 (+25%)
ctc-asrEnd-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.
Stars: ✭ 112 (+180%)
speech-transformerTransformer implementation speciaized in speech recognition tasks using Pytorch.
Stars: ✭ 40 (+0%)
TFGANTFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (+62.5%)
nabaztag-phpa simple php implementation of a Nabaztag server
Stars: ✭ 14 (-65%)
lidboxEnd-to-end spoken language identification out of the box.
Stars: ✭ 39 (-2.5%)
JD-NMFJoint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion (TBME 2016)
Stars: ✭ 20 (-50%)
SignDetectThis application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
Stars: ✭ 21 (-47.5%)
Voice2MeshCVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
Stars: ✭ 67 (+67.5%)
D-TDNNPyTorch implementation of Densely Connected Time Delay Neural Network
Stars: ✭ 60 (+50%)
capeContinuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-27.5%)
web-speech-demoLearn how to build a simple text-to-speech voice app for the web using the Web Speech API.
Stars: ✭ 19 (-52.5%)
CVCCVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
Stars: ✭ 45 (+12.5%)
Speech Feature ExtractionFeature extraction of speech signal is the initial stage of any speech recognition system.
Stars: ✭ 78 (+95%)
kaldi ag trainingDocker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-65%)
MelNet-SpeechGenerationImplementation of MelNet in PyTorch to generate high-fidelity audio samples
Stars: ✭ 19 (-52.5%)
StyleSpeechOfficial implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+302.5%)
linear16Converts an audio file to LINEAR16 Google-speech compatible file.
Stars: ✭ 14 (-65%)
audio noise clusteringhttps://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-40%)
speech to texthow to use the Google Cloud Speech API to transcribe audio/video files.
Stars: ✭ 35 (-12.5%)
simple-obs-sttSpeech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+122.5%)
DeepSegmentorSequence Segmentation using Joint RNN and Structured Prediction Models (ICASSP 2017)
Stars: ✭ 17 (-57.5%)
KeenASR-Android-PoCA proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-47.5%)
ttslearnttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (+295%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-47.5%)
datasets🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Stars: ✭ 13,870 (+34575%)
FAST-RIRThis is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Stars: ✭ 90 (+125%)
TASNETTime-domain Audio Separation Network (IN PYTORCH)
Stars: ✭ 18 (-55%)
rnn benchmarksRNN benchmarks of pytorch, tensorflow and theano
Stars: ✭ 85 (+112.5%)
deepspeech.mxnetA MXNet implementation of Baidu's DeepSpeech architecture
Stars: ✭ 82 (+105%)
NBSSThe official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".
Stars: ✭ 77 (+92.5%)
HTKThe Hidden Markov Model Toolkit (HTK) from University of Cambridge, with fixed issues.
Stars: ✭ 23 (-42.5%)
kaldi helpers🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
Stars: ✭ 13 (-67.5%)
ASR-Audio-Data-LinksA list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+347.5%)
ventib📈 Ventib records your voice, transcribes it in realtime, and performs speech pattern analysis to give you objective statistics about how you speak.
Stars: ✭ 43 (+7.5%)
melganMelGAN implementation with Multi-Band and Full Band supports...
Stars: ✭ 54 (+35%)
pytorch-pcenPyTorch reimplementation of per-channel energy normalization for audio.
Stars: ✭ 80 (+100%)
wav2vec2-liveA live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+412.5%)
AESRC2020a deep accent recognition network
Stars: ✭ 35 (-12.5%)