dropclass speakerDropClass and DropAdapt - repository for the paper accepted to Speaker Odyssey 2020
Stars: ✭ 20 (-66.67%)
GE2E-LossPytorch implementation of Generalized End-to-End Loss for speaker verification
Stars: ✭ 72 (+20%)
StreamingSpeakerDiarizationOfficial open source implementation of the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"
Stars: ✭ 79 (+31.67%)
LIUMScripts for LIUM SpkDiarization tools
Stars: ✭ 28 (-53.33%)
Speaker-IdentificationA program for automatic speaker identification using deep learning techniques.
Stars: ✭ 84 (+40%)
MiniVoxCode for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-75%)
Speaker-RecognitionThis repo contains my attempt to create a Speaker Recognition and Verification system using SideKit-1.3.1
Stars: ✭ 94 (+56.67%)
bobBob is a free signal-processing and machine learning toolbox originally developed by the Biometrics group at Idiap Research Institute, in Switzerland. - Mirrored from https://gitlab.idiap.ch/bob/bob
Stars: ✭ 38 (-36.67%)
kaldi-timit-sre-ivectorDevelop speaker recognition model based on i-vector using TIMIT database
Stars: ✭ 17 (-71.67%)
UniSpeechUniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+273.33%)
wavenet-classifierKeras Implementation of Deepmind's WaveNet for Supervised Learning Tasks
Stars: ✭ 54 (-10%)
DeltaDELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+2365%)
meta-SRPytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)
Stars: ✭ 58 (-3.33%)
minutes🔭 Speaker diarization via transfer learning
Stars: ✭ 25 (-58.33%)
Kaldikaldi-asr/kaldi is the official location of the Kaldi project.
Stars: ✭ 11,151 (+18485%)
spatio-temporal-brainA Deep Graph Neural Network Architecture for Modelling Spatio-temporal Dynamics in rs-fMRI Data
Stars: ✭ 22 (-63.33%)
Zero-Shot-TTSUnofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
Stars: ✭ 33 (-45%)
capeContinuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
Stars: ✭ 29 (-51.67%)
CVCCVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
Stars: ✭ 45 (-25%)
StyleSpeechOfficial implementation of Meta-StyleSpeech and StyleSpeech
Stars: ✭ 161 (+168.33%)
SignDetectThis application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
Stars: ✭ 21 (-65%)
kaldi ag trainingDocker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-76.67%)
ventib📈 Ventib records your voice, transcribes it in realtime, and performs speech pattern analysis to give you objective statistics about how you speak.
Stars: ✭ 43 (-28.33%)
AdaSpeechAdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+80%)
NBSSThe official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".
Stars: ✭ 77 (+28.33%)
AESRC2020a deep accent recognition network
Stars: ✭ 35 (-41.67%)
Voice-MLMobileNet trained with VoxCeleb dataset and used for voice verification
Stars: ✭ 15 (-75%)
pytorch-pcenPyTorch reimplementation of per-channel energy normalization for audio.
Stars: ✭ 80 (+33.33%)
ASR-Audio-Data-LinksA list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+198.33%)
audio noise clusteringhttps://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-60%)
ShifterPitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-63.33%)
wav2vec2-liveA live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+241.67%)
txt2speechConvert text to speech using Google Translate API
Stars: ✭ 38 (-36.67%)
speaker extractiontarget speaker extraction and verification for multi-talker speech
Stars: ✭ 85 (+41.67%)
data-at-hand-mobileMobile application for exploring fitness data using both speech and touch interaction.
Stars: ✭ 50 (-16.67%)
simple-obs-sttSpeech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+48.33%)
anycontrolVoice control for your websites and applications
Stars: ✭ 53 (-11.67%)
TF-Speech-Recognition-Challenge-SolutionSource code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Stars: ✭ 58 (-3.33%)
TFGANTFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (+8.33%)
Multimodal-Gesture-Recognition-with-LSTMs-and-CTCAn end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.
Stars: ✭ 25 (-58.33%)
IMS-ToucanText-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+391.67%)
meta-embeddingsMeta-embeddings are a probabilistic generalization of embeddings in machine learning.
Stars: ✭ 22 (-63.33%)
KeenASR-Android-PoCA proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
Stars: ✭ 21 (-65%)
UHV-OTS-SpeechA data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+56.67%)
room-impulse-responsesA list of publicly available room impulse response datasets and scripts to download them.
Stars: ✭ 143 (+138.33%)
VQMIVCOfficial implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
Stars: ✭ 278 (+363.33%)
temporal-depth-segmentationSource code (train/test) accompanying the paper entitled "Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach" in CVPR 2019 (https://arxiv.org/abs/1903.10764).
Stars: ✭ 20 (-66.67%)
idear🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+40%)
RE-VERBspeaker diarization system using an LSTM
Stars: ✭ 22 (-63.33%)