Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.

Stars: ✭ 388 (-2.02%)

Mutual labels: dataset

Midiwriterjs

♬ A JavaScript library which provides an API for programmatically generating and creating expressive multi-track MIDI files and JSON objects.

Stars: ✭ 381 (-3.79%)

Mutual labels: audio

Vpgnet

VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)

Stars: ✭ 382 (-3.54%)

Mutual labels: dataset

Mystiq

Qt5/C++ FFmpeg Media Converter

Stars: ✭ 393 (-0.76%)

Mutual labels: audio

Comma2k19

A driving dataset for the development and validation of fused pose estimators and mapping algorithms

Stars: ✭ 391 (-1.26%)

Mutual labels: dataset

View All Similar Projects ➔

Free Spoken Digit Dataset (FSDD)

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

Current status

6 speakers
3,000 recordings (50 of each digit per speaker)
English pronunciations

Organization

Files are named in the following format: {digitLabel}_{speakerName}_{index}.wav Example: 7_jackson_32.wav

Contributions

Please contribute your homemade recordings. All recordings should be mono 8kHz wav files and be trimmed to have minimal silence. Don't forget to update metadata.py with the speaker meta-data.

To add your data, follow the recording instructions in acquire_data/say_numbers_prompt.py and then run split_and_label_numbers.py to make your files.

Metadata

metadata.py contains meta-data regarding the speakers gender and accents.

Included utilities

trimmer.py Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.

fsdd.py A simple class that provides an easy to use API to access the data.

spectogramer.py Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.

Usage

The test set officially consists of the first 10% of the recordings. Recordings numbered 0-4 (inclusive) are in the test and 5-49 are in the training set.

Made with FSDD

Did you use FSDD in a paper, project or app? Add it here!

External tools

C#/.NET. The FSDD dataset can be used in .NET applications using the FreeSpokenDigitsDataset class included withing the Accord.NET Framework. A basic example on how to perform spoken digits classification using audio MFCC features can be found here.

License

Creative Commons Attribution-ShareAlike 4.0 International

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 396

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗