kaldi ag trainingDocker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Stars: ✭ 14 (-76.67%)
VoluteRaspberry Pi + Nodejs = Speech Robot
Stars: ✭ 224 (+273.33%)
AdaSpeechAdaSpeech: Adaptive Text to Speech for Custom Voice
Stars: ✭ 108 (+80%)
NBSSThe official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".
Stars: ✭ 77 (+28.33%)
TimitThe DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Stars: ✭ 202 (+236.67%)
LingvoLingvo
Stars: ✭ 2,361 (+3835%)
AESRC2020a deep accent recognition network
Stars: ✭ 35 (-41.67%)
VQMIVCOfficial implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
Stars: ✭ 278 (+363.33%)
DurianImplementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
Stars: ✭ 111 (+85%)
Vq Vae SpeechPyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Stars: ✭ 187 (+211.67%)
ASR-Audio-Data-LinksA list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+198.33%)
audio noise clusteringhttps://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-60%)
Voice-MLMobileNet trained with VoxCeleb dataset and used for voice verification
Stars: ✭ 15 (-75%)
Tts Papers🐸 collection of TTS papers
Stars: ✭ 160 (+166.67%)
wav2vec2-liveA live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+241.67%)
ShifterPitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-63.33%)
WavegradA fast, high-quality neural vocoder.
Stars: ✭ 138 (+130%)
AllosaurusAllosaurus is a pretrained universal phone recognizer for more than 2000 languages
Stars: ✭ 135 (+125%)
speaker extractiontarget speaker extraction and verification for multi-talker speech
Stars: ✭ 85 (+41.67%)
Avpian open source voice command macro software
Stars: ✭ 130 (+116.67%)
TF-Speech-Recognition-Challenge-SolutionSource code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Stars: ✭ 58 (-3.33%)
Asr audio data linksA list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+113.33%)
TFGANTFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Stars: ✭ 65 (+8.33%)
AudiomatePython library for handling audio datasets.
Stars: ✭ 99 (+65%)
RE-VERBspeaker diarization system using an LSTM
Stars: ✭ 22 (-63.33%)
WikipronMassively multilingual pronunciation mining
Stars: ✭ 99 (+65%)
IMS-ToucanText-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+391.67%)
Code Switching PapersA curated list of research papers and resources on code-switching
Stars: ✭ 122 (+103.33%)
meta-embeddingsMeta-embeddings are a probabilistic generalization of embeddings in machine learning.
Stars: ✭ 22 (-63.33%)
UHV-OTS-SpeechA data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+56.67%)
HolobotHoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
Stars: ✭ 114 (+90%)
room-impulse-responsesA list of publicly available room impulse response datasets and scripts to download them.
Stars: ✭ 143 (+138.33%)
idear🎙️ Handsfree Audio Development Interface
Stars: ✭ 84 (+40%)
temporal-depth-segmentationSource code (train/test) accompanying the paper entitled "Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by a Multi-Task Geometric and Semantic Scene Understanding Approach" in CVPR 2019 (https://arxiv.org/abs/1903.10764).
Stars: ✭ 20 (-66.67%)
GttsPython library and CLI tool to interface with Google Translate's text-to-speech API
Stars: ✭ 1,303 (+2071.67%)
browser-apis🦄 Cool & Fun Browser Web APIs 🥳
Stars: ✭ 21 (-65%)
AudioData manipulation and transformation for audio signal processing, powered by PyTorch
Stars: ✭ 1,262 (+2003.33%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-65%)
lectures-allCentral repository for all lectures on deep learning at UPC ETSETB TelecomBCN.
Stars: ✭ 46 (-23.33%)
JuliusOpen-Source Large Vocabulary Continuous Speech Recognition Engine
Stars: ✭ 1,258 (+1996.67%)
TtsTools to convert text to speech 📚💬
Stars: ✭ 84 (+40%)
AutoSpeech[InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei Zha, Zhangyang Wang
Stars: ✭ 195 (+225%)
DeepspeechA PaddlePaddle implementation of ASR.
Stars: ✭ 1,219 (+1931.67%)
OpenasrA pytorch based end2end speech recognition system.
Stars: ✭ 69 (+15%)
lidboxEnd-to-end spoken language identification out of the box.
Stars: ✭ 39 (-35%)
Voice GenderGender recognition by voice and speech analysis
Stars: ✭ 248 (+313.33%)
Nlp Paper自然语言处理领域下的对话语音领域,整理相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)
Stars: ✭ 67 (+11.67%)