Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Kyubyong → Css10

Kyubyong / Css10

Licence: apache-2.0

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Labels

html dataset speech speech-to-text

Projects that are alternatives of or similar to Css10

Speechbrain.github.io

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

Stars: ✭ 242 (-19.87%)

Mutual labels: speech, speech-to-text

ASR-Audio-Data-Links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 179 (-40.73%)

Mutual labels: speech, speech-to-text

Open stt

Open STT

Stars: ✭ 584 (+93.38%)

Mutual labels: dataset, speech-to-text

Lingvo

Stars: ✭ 2,361 (+681.79%)

Mutual labels: speech, speech-to-text

kaldi helpers

🙊 A set of scripts to use in preparing a corpus for speech-to-text processing with the Kaldi Automatic Speech Recognition Library.

Stars: ✭ 13 (-95.7%)

Mutual labels: speech, speech-to-text

Edgedict

Working online speech recognition based on RNN Transducer. ( Trained model release available in release )

Stars: ✭ 205 (-32.12%)

Mutual labels: speech, speech-to-text

wav2vec2-live

A live speech recognition using Facebooks wav2vec 2.0 model.

Stars: ✭ 205 (-32.12%)

Mutual labels: speech, speech-to-text

Deepspeech

A PaddlePaddle implementation of ASR.

Stars: ✭ 1,219 (+303.64%)

Mutual labels: speech, speech-to-text

kaldi ag training

Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-grammar.

Stars: ✭ 14 (-95.36%)

Mutual labels: speech, speech-to-text

simple-obs-stt

Speech-to-text and keyboard input captions for OBS.

Stars: ✭ 89 (-70.53%)

Mutual labels: speech, speech-to-text

Tacotron asr

Speech Recognition Using Tacotron

Stars: ✭ 165 (-45.36%)

Mutual labels: speech, speech-to-text

speech to text

how to use the Google Cloud Speech API to transcribe audio/video files.

Stars: ✭ 35 (-88.41%)

Mutual labels: speech, speech-to-text

Asr audio data links

A list of publically available audio data that anyone can download for ASR or other speech activities

Stars: ✭ 128 (-57.62%)

Mutual labels: speech, speech-to-text

Kerasdeepspeech

A Keras CTC implementation of Baidu's DeepSpeech for model experimentation

Stars: ✭ 245 (-18.87%)

Mutual labels: speech, speech-to-text

Kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Stars: ✭ 11,151 (+3592.38%)

Mutual labels: speech, speech-to-text

anycontrol

Voice control for your websites and applications

Stars: ✭ 53 (-82.45%)

Mutual labels: speech, speech-to-text

Watbot

An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.

Stars: ✭ 64 (-78.81%)

Mutual labels: speech, speech-to-text

Openasr

A pytorch based end2end speech recognition system.

Stars: ✭ 69 (-77.15%)

Mutual labels: speech, speech-to-text

KeenASR-Android-PoC

A proof-of-concept app using KeenASR SDK on Android. WE ARE HIRING: https://keenresearch.com/careers.html

Stars: ✭ 21 (-93.05%)

Mutual labels: speech, speech-to-text

deepspeech.mxnet

A MXNet implementation of Baidu's DeepSpeech architecture

Stars: ✭ 82 (-72.85%)

Mutual labels: speech, speech-to-text

View All Similar Projects ➔

CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages

Abstract

We describe our development of CSS10, a collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts. To validate its quality we train two neural text-to-speech models on each dataset. Subsequently, we conduct Mean Opinion Score tests on the synthesized speech samples. We make our datasets, pretrained models, and test resources publicly available. We hope they will be used for future speech tasks.

For details, check our paper. Kyubyong gave a talk with this paper at the workshop of 2018 The Korean Society of Speech Sciences.

Environments & Dependencies

Linux
Python 2.X or 3.X
TensorFlow == 1.3
NumPy
Librosa
Matplotlib
tqdm
scipy

Audiobooks & Datasets

Code	Language	Audiobook	Running Time	Reader	Dataset
de	German	1. Meister Floh 2. Die acht Gesichter am Biwasee 3. Auswahl aus Die Serapionsbrüder	16:42:45	Hokuspokus	CSS German
el	Greek	Παραμύθι χωρίς όνομα (Tale Without Name)	04:08:14	Rapunzelina	CSS Greek
es	Spanish	1. Bailén 2. El 19 de Marzo y el 2 de Mayo 3. La Batalla de los Arapiles	23:49:49	Tux	CSS Spanish
fi	Finnish	1. Gulliverin matkat kaukaisilla mailla 2. Ensimmäiset novellit 3. Kaleri-orja 4. Salmelan heinätalkoot	10:32:03	Harri Tapani Ylilammi	CSS Finnish
fr	French	1. Les Misérables - tome 5 . 2. Arsène Lupin contre Herlock Sholmès	19:09:03	Gilles G. Le Blanc	CSS French
hu	Hungarian	Egri csillagok	10:00:25	Diana Majlinger	CSS Hungarian
ja	Japanese	明暗 (Meian)	14:55:36	ekzemplaro	CSS Japanese
nl	Dutch	20.000 Mijlen onder Zee	14:06:40	Bart de Leeuw	CSS Dutch
ru	Russian	1. Ice March - Ледяной поход 2. Early Short Stories 3. Short Stories for Children and Adults	21:22:10	Mark Chulsky	CSS Russian
zh	Chinese	1. 朝花夕拾 (Chao Hua Si She))2. 呐喊 (Call to Arms)	06:27:04	Jing Li	CSS Chinese

Pretrained Models & Audio Samples

Code	Lanuage	Pretrained Models	Audio Samples
de	German	DCTTS \| TACOTRON	DCTTS \| TACOTRON
el	Greek	DCTTS	DCTTS
es	Spanish	DCTTS \| TACOTRON	DCTTS \| TACOTRON
fi	Finnish	DCTTS \| TACOTRON	DCTTS \| TACOTRON
fr	French	DCTTS \| TACOTRON	DCTTS \| TACOTRON
hu	Hungarian	DCTTS \| TACOTRON	DCTTS \| TACOTRON
ja	Japanese	DCTTS \| TACOTRON	DCTTS \| TACOTRON
nl	Dutch	DCTTS \| TACOTRON	DCTTS \| TACOTRON
ru	Russian	DCTTS \| TACOTRON	DCTTS \| TACOTRON
zh	Chinese	DCTTS \| TACOTRON	DCTTS \| TACOTRON

Cite

@article{park2019css10,
  title={CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages},
  author={Park, Kyubyong and Mulc, Thomas},
  journal={Interspeech},
  year={2019}
}

By Kyubyong Park, Tommy Mulc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 302

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗