fadeA Simulation Framework for Auditory Discrimination Experiments
nabaztag-phpa simple php implementation of a Nabaztag server
HTKThe Hidden Markov Model Toolkit (HTK) from University of Cambridge, with fixed issues.
KARENKAREN: Unifying Hatespeech Detection and Benchmarking
opensnipsOpen source projects related to Snips https://snips.ai/.
speech to texthow to use the Google Cloud Speech API to transcribe audio/video files.
nlp-classA Natural Language Processing course taught by Professor Ghassemi
TASNETTime-domain Audio Separation Network (IN PYTORCH)
Voice2MeshCVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?
UniSpeechUniSpeech - Large Scale Self-Supervised Learning for Speech
web-speech-demoLearn how to build a simple text-to-speech voice app for the web using the Web Speech API.
linear16Converts an audio file to LINEAR16 Google-speech compatible file.
speech-transformerTransformer implementation speciaized in speech recognition tasks using Pytorch.
DeepSegmentorSequence Segmentation using Joint RNN and Structured Prediction Models (ICASSP 2017)
VAD-LTSDEfficient voice activity detection algorithm using long-term speech information
datasets🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
JD-NMFJoint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion (TBME 2016)
D-TDNNPyTorch implementation of Densely Connected Time Delay Neural Network
kaldi helpers🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
melganMelGAN implementation with Multi-Band and Full Band supports...
data-at-hand-mobileMobile application for exploring fitness data using both speech and touch interaction.
AdaSpeechAdaSpeech: Adaptive Text to Speech for Custom Voice
CVCCVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
PhomemeSimple sentence mixing tool (work in progress)
kaldi ag trainingDocker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Zero-Shot-TTSUnofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
StyleSpeechOfficial implementation of Meta-StyleSpeech and StyleSpeech
audio noise clusteringhttps://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
ShifterPitch shifter using WSOLA and resampling implemented by Python3
TFGANTFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
KeenASR-Android-PoCA proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
room-impulse-responsesA list of publicly available room impulse response datasets and scripts to download them.
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
lidboxEnd-to-end spoken language identification out of the box.
FAST-RIRThis is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
SignDetectThis application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
NBSSThe official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".
capeContinuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
ASR-Audio-Data-LinksA list of publically available audio data that anyone can download for ASR or other speech activities
ventib📈 Ventib records your voice, transcribes it in realtime, and performs speech pattern analysis to give you objective statistics about how you speak.
pytorch-pcenPyTorch reimplementation of per-channel energy normalization for audio.
wav2vec2-liveA live speech recognition using Facebooks wav2vec 2.0 model.
txt2speechConvert text to speech using Google Translate API
anycontrolVoice control for your websites and applications
TF-Speech-Recognition-Challenge-SolutionSource code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Multimodal-Gesture-Recognition-with-LSTMs-and-CTCAn end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.
IMS-ToucanText-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
VQMIVCOfficial implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!