All Projects → german-asr → megs

german-asr / megs

Licence: MIT license
A merged version of multiple open-source German speech datasets.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to megs

Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+1723.81%)
Mutual labels:  speech-recognition, speech-to-text, asr
Edgedict
Working online speech recognition based on RNN Transducer. ( Trained model release available in release )
Stars: ✭ 205 (+876.19%)
Mutual labels:  speech-recognition, speech-to-text, asr
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+2385.71%)
Mutual labels:  speech-recognition, speech-to-text, asr
Lingvo
Lingvo
Stars: ✭ 2,361 (+11142.86%)
Mutual labels:  speech-recognition, speech-to-text, asr
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+6361.9%)
Mutual labels:  speech-recognition, speech-to-text, asr
demo vietasr
Vietnamese Speech Recognition
Stars: ✭ 22 (+4.76%)
Mutual labels:  speech-recognition, speech-to-text, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (+171.43%)
Mutual labels:  speech-recognition, speech-to-text, asr
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (+28.57%)
Mutual labels:  speech-recognition, speech-to-text, asr
Mongolian Speech Recognition
Mongolian speech recognition with PyTorch
Stars: ✭ 97 (+361.9%)
Mutual labels:  speech-recognition, speech-to-text, asr
Wav2letter
Speech Recognition model based off of FAIR research paper built using Pytorch.
Stars: ✭ 78 (+271.43%)
Mutual labels:  speech-recognition, speech-to-text, asr
sova-asr
SOVA ASR (Automatic Speech Recognition)
Stars: ✭ 123 (+485.71%)
Mutual labels:  speech-recognition, speech-to-text, asr
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (+0%)
Mutual labels:  corpus, speech-recognition, asr
speech-recognition
SDKs and docs for Skit's speech to text service
Stars: ✭ 20 (-4.76%)
Mutual labels:  speech-recognition, speech-to-text, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+1352.38%)
Mutual labels:  speech-recognition, speech-to-text, asr
vosk-asterisk
Speech Recognition in Asterisk with Vosk Server
Stars: ✭ 52 (+147.62%)
Mutual labels:  speech-recognition, speech-to-text, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+3414.29%)
Mutual labels:  speech-recognition, speech-to-text, asr
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (+0%)
Mutual labels:  speech-recognition, speech-to-text, asr
kaldi-long-audio-alignment
Long audio alignment using Kaldi
Stars: ✭ 21 (+0%)
Mutual labels:  speech-recognition, speech-to-text, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (+228.57%)
Mutual labels:  speech-recognition, speech-to-text, asr
Asr audio data links
A list of publically available audio data that anyone can download for ASR or other speech activities
Stars: ✭ 128 (+509.52%)
Mutual labels:  speech-recognition, speech-to-text, asr

MEGS - Merged German Speech

This repository contains scripts to reproduce a merged version of multiple open-source german speech datasets. For german there is no large speech corpus for automatic speech recognition tasks, as in english with for example librispeech. Therefore this repository combines multiple german speech corpora into a single one. Check licenses in the list below or on the sites of the specific datasets, if you want use the data for any special purposes.

Recreate

In order to recreate the same corpus as in this repository, execute the commands in the scripts recreate.sh. The scripts does the following steps.

  1. Download all corpora to data/download. Only the common-voice corpus has to be downloaded manually and placed inside data/download/common_voice.

  2. Merges all corpora into a single one. Furthermore creates specific subsets for train/dev/test.

  3. Checks if the created corpus is equal to the given state of the repository. This is done by comparing hash values against the hash values in the file data/state.json.

  4. If needed the corpus can be converted to wave files only. This will make sure every utterance is in a separate wave file with a sampling rate of 16000.

Corpus usage

The final corpus is stored in data/full. The format of the corpus is the default format of the audiomate library. It is described in audiomate default format.

Audiomate also can be used to read the corpus:

import audiomate

corpus = audiomate.Corpus.load('data/full')
utt = corpus.utterances['utt-idx']
transcript = utt.label_lists[audiomate.corpus.LL_WORD_TRANSCRIPT].join()
samples = utt.read_samples(sr=16000)

Checkout https://github.com/ynop/audiomate for more information.

Corpus Statistics

Part h Speakers
unfiltered 1021.31 not known due to the absence of info in M-Ailabs
train 536.90 not known due to the absence of info in M-Ailabs
dev 17.75 1151
test 18.22 2037
full_common_voice 324.19 4852
train_common_voice 10.20 552
dev_common_voice 7.04 1010
test_common_voice 7.71 1901
full_mailabs 233.66 -
train_mailabs 233.50 -
dev_mailabs 0.00 0
test_mailabs 0.00 0
full_swc 248.47 569
train_swc 238.01 527
dev_swc 4.26 26
test_swc 4.18 16
full_tuda 183.30 179
train_tuda 31.49 146
dev_tuda 2.41 16
test_tuda 2.38 17
full_voxforge 31.69 328
train_voxforge 23.70 126
dev_voxforge 4.04 99
test_voxforge 3.96 103

Corpus sources

Name URL License
Common-Voice https://voice.mozilla.org/en/datasets CC-0
TuDa https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/acoustic-models.html CC-BY
M-AILabs https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/ See Page
VoxForge http://www.voxforge.org/de GPL
SWC https://nats.gitlab.io/swc/ CC BY-SA 4.0

Create a new version

The scripts create.sh contains the commands to create a new version of the corpus.

Changelog

Version Changes
v1 Initial version
v2 Smaller test sets, Filter long utterances (> 25s)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].