All Projects → liangstein → Chinese Speech To Text

liangstein / Chinese Speech To Text

Licence: apache-2.0
Chinese Speech To Text Using Wavenet

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Chinese Speech To Text

Deep Learning Drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
Stars: ✭ 9,717 (+7736.29%)
Mutual labels:  deep-neural-networks, speech-recognition
Voice activity detection
Voice Activity Detection based on Deep Learning & TensorFlow
Stars: ✭ 132 (+6.45%)
Mutual labels:  deep-neural-networks, speech-recognition
Keras Kaldi
Keras Interface for Kaldi ASR
Stars: ✭ 124 (+0%)
Mutual labels:  deep-neural-networks, speech-recognition
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1591.13%)
Mutual labels:  deep-neural-networks, speech-recognition
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+410.48%)
Mutual labels:  deep-neural-networks, speech-recognition
Speech To Text Benchmark
speech to text benchmark framework
Stars: ✭ 481 (+287.9%)
Mutual labels:  deep-neural-networks, speech-recognition
Hey Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Stars: ✭ 161 (+29.84%)
Mutual labels:  deep-neural-networks, speech-recognition
Kur
Descriptive Deep Learning
Stars: ✭ 811 (+554.03%)
Mutual labels:  deep-neural-networks, speech-recognition
Vosk Api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Stars: ✭ 1,357 (+994.35%)
Mutual labels:  deep-neural-networks, speech-recognition
Ml Fraud Detection
Credit card fraud detection through logistic regression, k-means, and deep learning.
Stars: ✭ 117 (-5.65%)
Mutual labels:  deep-neural-networks
Nlp Pretrained Model
A collection of Natural language processing pre-trained models.
Stars: ✭ 122 (-1.61%)
Mutual labels:  deep-neural-networks
Tenginekit
TengineKit - Free, Fast, Easy, Real-Time Face Detection & Face Landmarks & Face Attributes & Hand Detection & Hand Landmarks & Body Detection & Body Landmarks & Iris Landmarks & Yolov5 SDK On Mobile.
Stars: ✭ 2,103 (+1595.97%)
Mutual labels:  deep-neural-networks
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (-4.84%)
Mutual labels:  speech-recognition
Lenet 5
PyTorch implementation of LeNet-5 with live visualization
Stars: ✭ 122 (-1.61%)
Mutual labels:  deep-neural-networks
Onnx
Open standard for machine learning interoperability
Stars: ✭ 11,829 (+9439.52%)
Mutual labels:  deep-neural-networks
Pointwise
Code for Pointwise Convolutional Neural Networks, CVPR 2018
Stars: ✭ 123 (-0.81%)
Mutual labels:  deep-neural-networks
Tfg Voice Conversion
Deep Learning-based Voice Conversion system
Stars: ✭ 115 (-7.26%)
Mutual labels:  deep-neural-networks
Holobot
HoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
Stars: ✭ 114 (-8.06%)
Mutual labels:  speech-recognition
Hyperdensenet
This repository contains the code of HyperDenseNet, a hyper-densely connected CNN to segment medical images in multi-modal image scenarios.
Stars: ✭ 124 (+0%)
Mutual labels:  deep-neural-networks
Perceptualsimilarity
LPIPS metric. pip install lpips
Stars: ✭ 2,037 (+1542.74%)
Mutual labels:  deep-neural-networks

Chinese-speech-to-text

Speech recognition trained by THCHS30 open Chinese speech database.

Dependency

  • Python3.6(numpy, scipy, pickle, h5py, librosa),
  • Keras2.02,
  • Tensorflow v1.1 backend, (Not tested with Theano backend)
  • Cuda8.0, Cudnn6.0 (If GPU is used)

Neural Network Implementation

The neural network used is Wavenet, which is firstly raised in Deepmind's paper. The recognition is done on character level (no need to vectorizing 10000 words), therefore the dimension is much smaller than recurrent neural network. The structure is Here.

The training dataset is small (only 10000 samples), and the ctc loss is decreased to 0.2768 after 124 epochs. The training time on a GTX 1080 is 15 hours.

Results

audio: [1.wav]
listened: 一九九山年二二十的上午务四穿声看月显安人向武村碰加工嫂五人进城都体一服
ground text:一九九三年二月二十三日上午四川省安岳县岳源乡五村彭家姑嫂五人进城购置衣服

audio: [2.wav]
listened: 看亚够考前跑后你直惊准的山却也得他王欧尧起声王的在山进回道
ground text:看羊狗跑前跑后一只惊飞的山雀惹得它汪汪汪咬几声嗡嗡嗡的在山间回荡

audio: [3.wav]
listened: 北积穿过云层易下一片鱼海又时头过喜过的运物一些可件然国冲绿的群山大底
ground text:飞机穿过云层眼下一片云海有时透过稀薄的云雾依稀可见南国葱绿的群山大地

audio: [4.wav]
listened: 王宁看被墙颠后不范云燕孙场起来及自卫不均为抓活
ground text:王英汉被枪毙后部分余孽深藏起来几次围捕均未抓获

audio: [5.wav]
listened: 其书有现人原本于穷取来观邪不凑找他了莫银杷要求看四损面看富面
ground text:其中有些人原本与陈曲澜关系不错找他软磨硬泡要求不看僧面看佛面

We test the recognition ability using the audio files from test set. Although the training dataset is small (10000 samples), it can recognize key words already. Right now the model isn't trained for recognitions in noisy environments. Larger and more complex training dataset can have better recognition results.

Authors

liangstein ([email protected], [email protected]) Contact me if needed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].