All Projects → Jakobovski → Free Spoken Digit Dataset

Jakobovski / Free Spoken Digit Dataset

A free audio dataset of spoken digits. Think MNIST for audio.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Free Spoken Digit Dataset

Avsr Deep Speech
Google Summer of Code 2017 Project: Development of Speech Recognition Module for Red Hen Lab
Stars: ✭ 43 (-89.14%)
Mutual labels:  speech-recognition, audio
Swiftspeech
A speech recognition framework designed for SwiftUI.
Stars: ✭ 149 (-62.37%)
Mutual labels:  speech-recognition, audio
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-88.13%)
Mutual labels:  speech-recognition, audio
Audio Pretrained Model
A collection of Audio and Speech pre-trained models.
Stars: ✭ 61 (-84.6%)
Mutual labels:  speech-recognition, audio
Multidigitmnist
Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning
Stars: ✭ 48 (-87.88%)
Mutual labels:  dataset, mnist
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+92.93%)
Mutual labels:  speech-recognition, audio
Audiomate
Python library for handling audio datasets.
Stars: ✭ 99 (-75%)
Mutual labels:  speech-recognition, audio
Speech recognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Stars: ✭ 5,999 (+1414.9%)
Mutual labels:  speech-recognition, audio
Esc 50
ESC-50: Dataset for Environmental Sound Classification
Stars: ✭ 631 (+59.34%)
Mutual labels:  dataset, audio
Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+594.7%)
Mutual labels:  speech-recognition, audio
Lingvo
Lingvo
Stars: ✭ 2,361 (+496.21%)
Mutual labels:  speech-recognition, mnist
Mirdata
Python library to work with Music Information Retrieval datasets
Stars: ✭ 170 (-57.07%)
Mutual labels:  dataset, audio
Fashion Mnist
A MNIST-like fashion product database. Benchmark 👇
Stars: ✭ 9,675 (+2343.18%)
Mutual labels:  dataset, mnist
Medmnist
[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Stars: ✭ 338 (-14.65%)
Mutual labels:  dataset, mnist
Supercolliderjs
The JavaScript client library for SuperCollider
Stars: ✭ 381 (-3.79%)
Mutual labels:  audio
Cmu Multimodalsdk
CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.
Stars: ✭ 388 (-2.02%)
Mutual labels:  dataset
Midiwriterjs
♬ A JavaScript library which provides an API for programmatically generating and creating expressive multi-track MIDI files and JSON objects.
Stars: ✭ 381 (-3.79%)
Mutual labels:  audio
Vpgnet
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
Stars: ✭ 382 (-3.54%)
Mutual labels:  dataset
Mystiq
Qt5/C++ FFmpeg Media Converter
Stars: ✭ 393 (-0.76%)
Mutual labels:  audio
Comma2k19
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Stars: ✭ 391 (-1.26%)
Mutual labels:  dataset

Free Spoken Digit Dataset (FSDD)

DOI

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

Current status

  • 6 speakers
  • 3,000 recordings (50 of each digit per speaker)
  • English pronunciations

Organization

Files are named in the following format: {digitLabel}_{speakerName}_{index}.wav Example: 7_jackson_32.wav

Contributions

Please contribute your homemade recordings. All recordings should be mono 8kHz wav files and be trimmed to have minimal silence. Don't forget to update metadata.py with the speaker meta-data.

To add your data, follow the recording instructions in acquire_data/say_numbers_prompt.py and then run split_and_label_numbers.py to make your files.

Metadata

metadata.py contains meta-data regarding the speakers gender and accents.

Included utilities

trimmer.py Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.

fsdd.py A simple class that provides an easy to use API to access the data.

spectogramer.py Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.

Usage

The test set officially consists of the first 10% of the recordings. Recordings numbered 0-4 (inclusive) are in the test and 5-49 are in the training set.

Made with FSDD

Did you use FSDD in a paper, project or app? Add it here!

External tools

License

Creative Commons Attribution-ShareAlike 4.0 International

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].