VAD-LTSDEfficient voice activity detection algorithm using long-term speech information
Stars: ✭ 37 (-79.67%)
IMS-ToucanText-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
Stars: ✭ 295 (+62.09%)
Speechbrain.github.ioThe SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+32.97%)
Zzz Retired opensttRETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (-19.78%)
ShifterPitch shifter using WSOLA and resampling implemented by Python3
Stars: ✭ 22 (-87.91%)
MlkitA collection of sample apps to demonstrate how to use Google's ML Kit APIs on Android and iOS
Stars: ✭ 949 (+421.43%)
Pocketsphinx PythonPython interface to CMU Sphinxbase and Pocketsphinx libraries
Stars: ✭ 298 (+63.74%)
web-speech-demoLearn how to build a simple text-to-speech voice app for the web using the Web Speech API.
Stars: ✭ 19 (-89.56%)
spokestack-androidExtensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (-71.43%)
hifigan-denoiserHiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Stars: ✭ 88 (-51.65%)
ttslearnttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)
Stars: ✭ 158 (-13.19%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-88.46%)
Voice GenderGender recognition by voice and speech analysis
Stars: ✭ 248 (+36.26%)
SignDetectThis application is developed to help speechless people interact with others with ease. It detects voice and converts the input speech into a sign language based video.
Stars: ✭ 21 (-88.46%)
Gcc NmfReal-time GCC-NMF Blind Speech Separation and Enhancement
Stars: ✭ 231 (+26.92%)
Vq Vae SpeechPyTorch implementation of VQ-VAE + WaveNet by [Chorowski et al., 2019] and VQ-VAE on speech signals by [van den Oord et al., 2017]
Stars: ✭ 187 (+2.75%)
PhomemeSimple sentence mixing tool (work in progress)
Stars: ✭ 18 (-90.11%)
LIUMScripts for LIUM SpkDiarization tools
Stars: ✭ 28 (-84.62%)
UniSpeechUniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+23.08%)
Speech Emotion AnalyzerThe neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+247.8%)
Annyang💬 Speech recognition for your site
Stars: ✭ 6,216 (+3315.38%)
spafe🔉 spafe: Simplified Python Audio Features Extraction
Stars: ✭ 310 (+70.33%)
Java Speech ApiThe J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
Stars: ✭ 490 (+169.23%)
anycontrolVoice control for your websites and applications
Stars: ✭ 53 (-70.88%)
PysptkA python wrapper for Speech Signal Processing Toolkit (SPTK).
Stars: ✭ 297 (+63.19%)
VoicerAGI-server voice recognizer for #Asterisk
Stars: ✭ 73 (-59.89%)
Avpian open source voice command macro software
Stars: ✭ 130 (-28.57%)
Ot Br PosixOpenThread Border Router, a Thread border router for POSIX-based platforms.
Stars: ✭ 161 (-11.54%)
Text DetectorTool which allow you to detect and translate text.
Stars: ✭ 173 (-4.95%)
Tts Papers🐸 collection of TTS papers
Stars: ✭ 160 (-12.09%)
YoutubeshopYoutube autolike and autosubs script
Stars: ✭ 177 (-2.75%)
MagicalexoplayerThe Easiest Way To Play/Stream Video And Audio Using Google ExoPlayer In Your Android Application
Stars: ✭ 171 (-6.04%)
Jaicf KotlinKotlin framework for conversational voice assistants and chatbots development
Stars: ✭ 160 (-12.09%)
Covid19 mobilityCOVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉
Stars: ✭ 156 (-14.29%)
NaomiThe Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Stars: ✭ 171 (-6.04%)
VocganVocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Stars: ✭ 158 (-13.19%)
YapfA formatter for Python files
Stars: ✭ 12,203 (+6604.95%)
Chatbot Watson AndroidAn Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
Stars: ✭ 169 (-7.14%)
FrvsrFrame-Recurrent Video Super-Resolution (official repository)
Stars: ✭ 157 (-13.74%)
Speech signal processing and classificationFront-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Stars: ✭ 155 (-14.84%)
Pytorch Kaldipytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1052.2%)
Google Translate🈯 A Node.JS library to consume Google Translate API for free.
Stars: ✭ 152 (-16.48%)
Aeneasaeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Stars: ✭ 1,942 (+967.03%)
Santa Tracker AndroidGoogle Santa Tracker app for Android is an educational and entertaining tradition
that brings joy to millions of children (and children at heart) across the world over the December
holiday period.
Stars: ✭ 2,062 (+1032.97%)
PlexusRemove the fear of Android app compatibility on de-Googled devices.
Stars: ✭ 152 (-16.48%)
Tutorial separationThis repo summarizes the tutorials, datasets, papers, codes and tools for speech separation and speaker extraction task. You are kindly invited to pull requests.
Stars: ✭ 151 (-17.03%)
GoogleauthrGoogle API Client Library for R. Easy authentication and help to build Google API R libraries with OAuth2. Shiny compatible.
Stars: ✭ 150 (-17.58%)