Implementation of Google Brain's WaveGrad high-fidelity vocoder (paper: https://arxiv.org/pdf/2009.00713.pdf). First implementation on GitHub.
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
A Keras CTC implementation of Baidu's DeepSpeech for model experimentation
Real-time GCC-NMF Blind Speech Separation and Enhancement
Tools for Speech Enhancement integrated with Kaldi
Raspberry Pi + Nodejs = Speech Robot
End-2-end speech synthesis with recurrent neural networks
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Speech synthesis for ESP8266 using S.A.M. port
Predicting depression from acoustic features of speech using a Convolutional Neural Network.
Vq Vae Speech
PyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Chatbot Watson Android
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
A fast, high-quality neural vocoder.
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
an open source voice command macro software
A physical model of the human vocal tract using literate programming, based on Pink Trombone.
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
kaldi-asr/kaldi is the official location of the Kaldi project.
Text-to-Speech for Arduino
HoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
Implementation of "Duration Informed Attention Network for Multimodal Synthesis" (https://arxiv.org/pdf/1909.01700.pdf) paper.
DELTA is a deep learning based natural language and speech processing platform.
Python library for handling audio datasets.
Massively multilingual pronunciation mining
Python library and CLI tool to interface with Google Translate's text-to-speech API
Data manipulation and transformation for audio signal processing, powered by PyTorch
Open-Source Large Vocabulary Continuous Speech Recognition Engine
Tools to convert text to speech 📚💬
A pytorch based end2end speech recognition system.