All Projects → EgorLakomkin → Ktspeechcrawler

EgorLakomkin / Ktspeechcrawler

Licence: mit
Automatically constructing corpus for automatic speech recognition from YouTube videos

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ktspeechcrawler

Wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Stars: ✭ 617 (+570.65%)
Mutual labels:  speech-recognition, asr
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+730.43%)
Mutual labels:  speech-recognition, asr
Libreasr
💬 An On-Premises, Streaming Speech Recognition System
Stars: ✭ 633 (+588.04%)
Mutual labels:  speech-recognition, asr
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+343.48%)
Mutual labels:  speech-recognition, asr
Syn Speech
Syn.Speech is a flexible speaker independent continuous speech recognition engine for Mono and .NET framework
Stars: ✭ 57 (-38.04%)
Mutual labels:  speech-recognition, asr
Silero Models
Silero Models: pre-trained STT models and benchmarks made embarrassingly simple
Stars: ✭ 522 (+467.39%)
Mutual labels:  speech-recognition, asr
Pykaldi
A Python wrapper for Kaldi
Stars: ✭ 756 (+721.74%)
Mutual labels:  speech-recognition, asr
Zamia Speech
Open tools and data for cloudless automatic speech recognition
Stars: ✭ 374 (+306.52%)
Mutual labels:  speech-recognition, asr
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-48.91%)
Mutual labels:  speech-recognition, asr
Social Scraper
Tổng hợp script crawl dữ liệu từ các mạng xã hội & website tiếng Việt
Stars: ✭ 47 (-48.91%)
Mutual labels:  crawler, youtube
Newpipeextractor
Core part of NewPipe
Stars: ✭ 400 (+334.78%)
Mutual labels:  crawler, youtube
Asr benchmark
Program to benchmark various speech recognition APIs
Stars: ✭ 71 (-22.83%)
Mutual labels:  speech-recognition, asr
Nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Stars: ✭ 392 (+326.09%)
Mutual labels:  speech-recognition, asr
Athena
an open-source implementation of sequence-to-sequence based speech processing engine
Stars: ✭ 542 (+489.13%)
Mutual labels:  speech-recognition, asr
Cheetah
On-device streaming speech-to-text engine powered by deep learning
Stars: ✭ 383 (+316.3%)
Mutual labels:  speech-recognition, asr
Eesen
The official repository of the Eesen project
Stars: ✭ 738 (+702.17%)
Mutual labels:  speech-recognition, asr
Tensorflow end2end speech recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)
Stars: ✭ 305 (+231.52%)
Mutual labels:  speech-recognition, asr
J.a.r.v.i.s
python powered Intelligent System
Stars: ✭ 325 (+253.26%)
Mutual labels:  speech-recognition, youtube
Espresso
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Stars: ✭ 808 (+778.26%)
Mutual labels:  speech-recognition, asr
Openasr
A pytorch based end2end speech recognition system.
Stars: ✭ 69 (-25%)
Mutual labels:  speech-recognition, asr

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Google Colab

https://colab.research.google.com/drive/1JVKzB9N2FIcxlib1kXuGlfeIuudkM9Vr

Installation

git clone https://github.com/EgorLakomkin/KTSpeechCrawler
pip install -r requirements.txt

Running crawler

chmod a+x ./crawler/en_corpus.sh
./crawler/en_corpus.sh <dir_with_intermediate_results> <dir_for_resulting_samples>

Browsing samples

python server.py --corpus <dir_for_resulting_samples>
Goto: http://localhost:8888/

Citation

@article{lakomkin2018kt, title={KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos}, author={Lakomkin, Egor and Magg, Sven and Weber, Cornelius and Wermter, Stefan}, journal={EMNLP 2018}, pages={90}, year={2018} }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].