Top 184 speech open source projects

fade
A Simulation Framework for Auditory Discrimination Experiments
MelNet-SpeechGeneration
Implementation of MelNet in PyTorch to generate high-fidelity audio samples
nabaztag-php
a simple php implementation of a Nabaztag server
HTK
The Hidden Markov Model Toolkit (HTK) from University of Cambridge, with fixed issues.
TASNET
Time-domain Audio Separation Network (IN PYTORCH)
web-speech-demo
Learn how to build a simple text-to-speech voice app for the web using the Web Speech API.
gtranscribe
Software for interview transcription
linear16
Converts an audio file to LINEAR16 Google-speech compatible file.
speech-transformer
Transformer implementation speciaized in speech recognition tasks using Pytorch.
VAD-LTSD
Efficient voice activity detection algorithm using long-term speech information
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
JD-NMF
Joint Dictionary Learning-based Non-Negative Matrix Factorization for Voice Conversion (TBME 2016)
kaldi helpers
🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.
voice-based-email-for-blind
Emailing System for visually impaired persons
CVC
CVC: Contrastive Learning for Non-parallel Voice Conversion (INTERSPEECH 2021, in PyTorch)
MajorDomo-Scenarios
Сценарии для системы домашней автоматизации Majordomo
aframe-speech-controls-component
alternative form of inputs for in-VR interaction with the content of a scene
kaldi ag training
Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.
Zero-Shot-TTS
Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration
audio noise clustering
https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Shifter
Pitch shifter using WSOLA and resampling implemented by Python3
TFGAN
TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
KeenASR-Android-PoC
A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html
room-impulse-responses
A list of publicly available room impulse response datasets and scripts to download them.
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
FAST-RIR
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
SignDetect
This application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
NBSS
The official repo of "Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training", "Multichannel Speech Separation with Narrow-band Conformer" and "NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer".
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
ventib
📈 Ventib records your voice, transcribes it in realtime, and performs speech pattern analysis to give you objective statistics about how you speak.
pytorch-pcen
PyTorch reimplementation of per-channel energy normalization for audio.
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
txt2speech
Convert text to speech using Google Translate API
TF-Speech-Recognition-Challenge-Solution
Source code of the model used in Tensorflow Speech Recognition Challenge (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge). The solution ranked in top 5% in private leaderboard.
Multimodal-Gesture-Recognition-with-LSTMs-and-CTC
An end-to-end system that performs temporal recognition of gesture sequences using speech and skeletal input. The model combines three networks with a CTC output layer that recognises gestures from continuous stream.
IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
VQMIVC
Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!
121-180 of 184 speech projects