All Projects → KathyReid → opensource-voice-tools

KathyReid / opensource-voice-tools

Licence: GPL-3.0 license
A repo listing known open source voice tools, ordered by where they sit in the voice stack

Programming Languages

TeX
3793 projects

Projects that are alternatives of or similar to opensource-voice-tools

spokestack-android
Extensible Android mobile voice framework: wakeword, ASR, NLU, and TTS. Easily add voice to any Android app!
Stars: ✭ 52 (+147.62%)
Mutual labels:  voice, speech, tts, speech-recognition, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+11142.86%)
Mutual labels:  speech, tts, speech-recognition, asr
simple-obs-stt
Speech-to-text and keyboard input captions for OBS.
Stars: ✭ 89 (+323.81%)
Mutual labels:  speech, tts, speech-recognition, stt
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+485.71%)
Mutual labels:  speech, speech-recognition, stt, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+876.19%)
Mutual labels:  speech, speech-recognition, asr
Voice Overlay Android
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 189 (+800%)
Mutual labels:  voice, conversational-ui, speech-recognition
spokestack-tray-android
A UI component that makes it easy to add voice interaction to your app.
Stars: ✭ 13 (-38.1%)
Mutual labels:  voice, tts, asr
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+438.1%)
Mutual labels:  corpus, tts, asr
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+509.52%)
Mutual labels:  speech, speech-recognition, asr
Pocketsphinx Python
Python interface to CMU Sphinxbase and Pocketsphinx libraries
Stars: ✭ 298 (+1319.05%)
Mutual labels:  voice, speech, speech-recognition
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (+0%)
Mutual labels:  corpus, speech-recognition, asr
End2end Asr Pytorch
End-to-End Automatic Speech Recognition on PyTorch
Stars: ✭ 175 (+733.33%)
Mutual labels:  speech, speech-recognition, asr
leopard
On-device speech-to-text engine powered by deep learning
Stars: ✭ 354 (+1585.71%)
Mutual labels:  speech-recognition, stt, asr
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+9885.71%)
Mutual labels:  speech, speech-recognition, asr
Annyang
💬 Speech recognition for your site
Stars: ✭ 6,216 (+29500%)
Mutual labels:  voice, speech, speech-recognition
wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
Stars: ✭ 205 (+876.19%)
Mutual labels:  speech, speech-recognition, asr
Voice Overlay Ios
🗣 An overlay that gets your user’s voice permission and input as text in a customizable UI
Stars: ✭ 440 (+1995.24%)
Mutual labels:  voice, conversational-ui, speech-recognition
ASR-Audio-Data-Links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 179 (+752.38%)
Mutual labels:  speech, speech-recognition, asr
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+6942.86%)
Mutual labels:  speech, speech-recognition, asr
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (+490.48%)
Mutual labels:  speech, speech-recognition, asr

A listing of open source voice tools

Introduction

Voice technology is taking off in a big way. For organisations, businesses and individuals trying to make sense of voice and where it sits in their technical architectures, it can be really confusing to understand the open source offerings that are out there.

This repo is a listing of known open source voice tools, structured by where those tools sit in the voice stack.

Transcription

Wake words

Speech to text

Website Tool name License Description
openslr.org Open Speech Language Resources N/A Run by @danpovey, who is also a key maintainer of the Kaldi-ASR speech to text tool
kaldi-asr.org Kaldi Automatic Speech Recognition toolkit. Apache 2 One of the first open source speech recognition toolkits. Academic reference is: Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Silovsky, J. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.

Intent parsing

Intent resolution

Text to speech

Website Tool name License Description
Flowtron by Nvidia A Tacotron-based speech synthsis tool which can be tweaked for pitch and prosody, setting it apart from other Tacotron-based TTS implementations Apache2 First released at the GTC 2020 Conference in May 2020. Academic paper is avaialble here. Citation is Valle, R., Shih, K., Prenger, R., & Catanzaro, B. (2020). Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis. arXiv preprint arXiv:2005.05957.

^ This is a great article that explains the differences in the evolutions or generations of text to speech - from concatenative to statistical parametric to generative. More modern TTS approaches like Tacotron and WaveNet are generative approaches.

Chatbots and Conversational UI tools

Website Tool name License Description
Mindmeld by Cisco . Apache2 The MindMeld Conversational AI platform is among the most advanced AI platforms for building production-quality conversational applications. It is a Python-based machine learning framework which encompasses all of the algorithms and utilities required for this purpose. Evolved over several years of building and deploying dozens of the most advanced conversational experiences achievable, MindMeld is optimized for building advanced conversational assistants which demonstrate deep understanding of a particular use case or domain while providing highly useful and versatile conversational experiences. The academic reference for this tool is:

Raghuvanshi, A., Carroll, L. and Raghunathan, K., 2018, November. Developing Production-Level Conversational Interfaces with Shallow Semantic Parsing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 157-162) |

Voice assistant wrappers

  • Mycroft.AI - an open source, layered voice assistant that works on a range of Linux-compatible hardware, such as x86 or ARM devices such as Raspberry Pi. Supported by a strong community of open source developers.

  • OVAL / Genie project at Stanford - Funded by the Alfred P Sloan Foundation and by a NIST grant, Stanford's OVAL project aims to provide an open source alternative to commercial voice assistants. The project is currently in its infancy and is attempting to build an open source community.

Natural language processing (NLP)

  • Python Natural Language Toolkit NLTK - NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

  • ECCO explainab - ECCO is a Python library that provides explainability for NLP using interactive visualisations.

  • Detext source code DeText is a Deep Text understanding framework for NLP related ranking, classification, and language generation tasks. It leverages semantic matching using deep neural networks to understand member intents in search and recommender systems. As a general NLP framework, currently DeText can be applied to many tasks, including search & recommendation ranking, multi-class classification and query understanding tasks. Published by the AI team at LinkedIn.

  • pglex - First presented at the ICLDC 7 conference in 2021, pglex is a 'pretty good' lexical service designed to facilitate the construction of dictionary websites and other applications that incorporate lexical data. With pglex, researchers can provide lexical entries in JSON format to an instance of the pglex API and get 'pretty good' search results without requiring language-specific configurations. Built on ElasticSearch.

Bias in voice assistants and NLP

  • Artie Bias Corpus - A corpus and set of tools for detecting demographic bias in ASR systems.

  • [Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (Technology) is Power: A Critical Survey of" Bias" in NLP. arXiv preprint arXiv:2005.14050.] https://arxiv.org/pdf/2005.14050.pdf

Speaker recognition

Forced aligners

Forced aligners help to align audio recordings with orthographic transcription

  • aeneas | Docs is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

Voice and language corpora

  • Berlin Database of Emotional Speech - A tagged corpus (in German/Deutsche) of speech tagged with emotions.
  • The Pile - The Pile is a 825 GiB diverse, open source language modelling data set that consists of 22 smaller, high-quality datasets combined together.

Data cleaning and repair tools

  • ActiveClean - ActiveClean is an iterative cleaning framework that can correctly retrain the machine learning model when data is cleaned, and provides a set of optimizations to select the best data to be cleaned. In this way, you only need to clean a small subset of the data in order to produce a model similar to if the full dataset were cleaned. Written in Python.

  • DataLinter - The Data Linter identifies potential issues (lints) in your ML training data.

  • Holoclean - Machine learning system for data enrichment

_There's also BoostClean from Columbia University but I can't find a code reference anywhere on the web.

Machine translation

  • No language left behind - Released by Meta, the NLLB project aims to make low-resource languages more accessible by providing a machine translation model which can translate between 200 languages. The model is evaluated using a human translated benchmark, FLORES-200, and perform 44% better than state of the art scores using BLEU.

Papers listings

Glossary

There are a lot of terms and acronyms in open source voice technology. This section provides explanations for each of them.

  • Cognitive arbitration: The process a voice assistant uses to understand what services and skills are available to it, depending on its context - such as being online or offline.

  • CRF: Conditional random field. A statistical modelling method which can take into account context. Used in some neural-network based intent-parsing and semantic extraction software.

  • LSTM: long short-term memory. Used within recurrent neural networks to help process sequences of data, such as audio or speech. In order to know what is likely to come next, LSTM records what came previously.

  • LVCSR: Large vocabulary continuous speech recognition. Used in speech recognition tools to denote that a) the vocabulary on which the recognizer works has not been restricted or constrained - for example if it is deployed on embedded or low-powered hardware which cannot handle the memory or compute requirements of a large vocabulary and b) the recognizer works continuously, in contrast to a Wake Word or Keyword spotter which cedes control to the STT once a Wake Word is detected.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].