All Projects → huckiyang → QuantumSpeech-QCNN

huckiyang / QuantumSpeech-QCNN

Licence: other
IEEE ICASSP 21 - Quantum Convolution Neural Networks for Speech Processing and Automatic Speech Recognition

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to QuantumSpeech-QCNN

Formant Analyzer
iOS application for finding formants in spoken sounds
Stars: ✭ 43 (-39.44%)
Mutual labels:  speech-recognition, speech-processing
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+240.85%)
Mutual labels:  speech-recognition, speech-processing
Keras Sincnet
Keras (tensorflow) implementation of SincNet (Mirco Ravanelli, Yoshua Bengio - https://github.com/mravanelli/SincNet)
Stars: ✭ 47 (-33.8%)
Mutual labels:  speech-recognition, speech-processing
Awesome Diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Stars: ✭ 673 (+847.89%)
Mutual labels:  speech-recognition, speech-processing
react-native-spokestack
Spokestack: give your React Native app a voice interface!
Stars: ✭ 53 (-25.35%)
Mutual labels:  speech-recognition, speech-processing
Sincnet
SincNet is a neural architecture for efficiently processing raw audio samples.
Stars: ✭ 764 (+976.06%)
Mutual labels:  speech-recognition, speech-processing
Zzz Retired openstt
RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
Stars: ✭ 146 (+105.63%)
Mutual labels:  speech-recognition, speech-processing
spokestack-ios
Spokestack: give your iOS app a voice interface!
Stars: ✭ 27 (-61.97%)
Mutual labels:  speech-recognition, speech-processing
torchsubband
Pytorch implementation of subband decomposition
Stars: ✭ 63 (-11.27%)
Mutual labels:  speech-recognition, speech-processing
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
Stars: ✭ 94 (+32.39%)
Mutual labels:  speech-recognition, speech-processing
Uspeech
Speech recognition toolkit for the arduino
Stars: ✭ 448 (+530.99%)
Mutual labels:  speech-recognition, speech-processing
Adaptive-Gradient-Clipping
Minimal implementation of adaptive gradient clipping (https://arxiv.org/abs/2102.06171) in TensorFlow 2.
Stars: ✭ 74 (+4.23%)
Mutual labels:  colab-notebook, tensorflow2
scim
[wip]Speech recognition tool-box written by Nim. Based on Arraymancer.
Stars: ✭ 17 (-76.06%)
Mutual labels:  speech-recognition, speech-processing
Pncc
A implementation of Power Normalized Cepstral Coefficients: PNCC
Stars: ✭ 40 (-43.66%)
Mutual labels:  speech-recognition, speech-processing
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
Stars: ✭ 224 (+215.49%)
Mutual labels:  speech-recognition, speech-processing
Nonautoreggenprogress
Tracking the progress in non-autoregressive generation (translation, transcription, etc.)
Stars: ✭ 118 (+66.2%)
Mutual labels:  speech-recognition, speech-processing
open-speech-corpora
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Stars: ✭ 841 (+1084.51%)
Mutual labels:  speech-recognition, speech-processing
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Stars: ✭ 205 (+188.73%)
Mutual labels:  speech-recognition, speech-processing
TFLite-ModelMaker-EfficientDet-Colab-Hands-On
TensorFlow Lite Model Makerで物体検出を行うハンズオン用資料です(Hands-on for object detection with TensorFlow Lite Model Maker)
Stars: ✭ 15 (-78.87%)
Mutual labels:  colab-notebook, tensorflow2
awesome-keyword-spotting
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Stars: ✭ 150 (+111.27%)
Mutual labels:  speech-recognition, speech-processing

Quantum Deep Learning for Speech

Quantum Machine Learning for Automatic Spoken-Term Recognition.

  • NEW Our paper is accepted to IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021.

We would like to thank the reviewers and committee members in the Speech Processing and Quantum Signals community.

Released the quantum speech processing code! (2020 Dec) Colab demo is also provided. ICASSP Video | Slides

  • ICASSP 21 Paper | Arxiv "Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition"

1. Environment

TensorFlow

  • option 1: from conda and pip install
conda install -c anaconda tensorflow-gpu=2.0
conda install -c conda-forge scikit-learn 
conda install -c conda-forge librosa 
pip install pennylane --upgrade 
  • option 2: from environment.yml (for 2080 Ti with CUDA 10.0)
conda env create -f environment.yml

Origin with tensorflow 2.0 with CUDA 10.0.

2. Dataset

We use Google Speech Commands Dataset V1 for Limited-Vocabulary Speech Recognition.

mkdir ../dataset
cd ../dataset
wget http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
tar -xf speech_commands_v0.01.tar.gz

2.1. Pre-processed Features

We provide 2000 pre-processed feautres in ./data_quantum, which included both mel features, and (2,2) quanvolution features with 1500 for training and 500 for testing. You could get 90.6% test accuracy by the provided data.

You could use np.load to load these features to train your own quantum speech processing model in 3.1.

2.2. Audio Features Extraction (optional)

Please set the sampling rate sr and data ratio (--port N for 1/N data; --port 1 for all data) for extracting Mel Features.

python main_qsr.py --sr 16000 --port 100 --mel 1 --quanv 1

2.3. Quanvolution Encoding (optional)

If you have pre-load audio features from 2.2. you can set the quantum convolution kernal size in helper_q_tool.py function quanv. We provide an example for kernal size = 3 in line 57.

You will see a message below during the Quanvolution Encoding with features extraction comment from 2.2..

===== Shape 60 126
Kernal =  2
Quantum pre-processing of train Speech:
2/175

3. Training

3.1 QCNN U-Net Bi-LSTM Attention Model

Spoken Terms Recognition with additional U-Net Encoder discussed in our work.

python main_qsr.py

In 25 epochs. One way to improve the recognition system performance is to encode more data for training, refer to 2.2. and 2.3.

1500/1500 [==============================] - 3s 2ms/sample - val_loss: 0.4408 - val_accuracy: 0.9060                              

Please set use_Unet = False. in model.py.

def attrnn_Model(x_in, labels, ablation = False):
    # simple LSTM
    rnn_func = L.LSTM
    use_Unet = False

3.2 Neural Saliency by Class Activation Mapping (CAM)

python cam_sp.py

3.3 CTC Model for Automatic Speech Recognition

We also provide a CTC model with Word Error Rate (WER) evaluation for future studies to the community refer to the discussion.

For example, an output "y-e--a" of input "yes" is identified as an incorrect word with the CTC alignment.

Noted this Quantum ASR CTC version is only supported tensorflow-gpu==2.3. Please create a new environment for running this experiment.

  • unzip the features for asr
cd data_quantum/asr_set
bash unzip.sh
  • run the ctc model in ./speech_quantum_dl
python qsr_ctc_wer.py

Result pre-trained weight in checkpoints/asr_ctc_demo.hdf5

Epoch 32/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1191 - val_loss: 0.7115
Epoch 33/50
107/107 [==============================] - 5s 49ms/step - loss: 0.1547 - val_loss: 0.6701
=== WER: 9.895833333333334  % 

Tutorial Link.

  • Only for academic purpose. Feel free to contact the author for the other purposes.

Reference

If you think this work helps your research or use the code, please consider reference our paper. Thank you!

@inproceedings{yang2021decentralizing,
  title={Decentralizing feature extraction with quantum convolutional neural network for automatic speech recognition},
  author={Yang, Chao-Han Huck and Qi, Jun and Chen, Samuel Yen-Chi and Chen, Pin-Yu and Siniscalchi, Sabato Marco and Ma, Xiaoli and Lee, Chin-Hui},
  booktitle={2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={6523--6527},
  year={2021},
  organization={IEEE}
}

Federated Learning and Virtualization

See PySyft and PyVertical for vertical federated learning setup. Please refer to a veritical learning example for virtualization.

Acknowledgment

We would like to appreciate Xanadu AI for providing the PennyLane and IBM research for providing qiskit and quantum hardware to the community. There is no conflict of interest.

FAQ

Since the area between speech and quantum ML is still quite new, please feel free to open a issue for discussion.

Feel free to use this implementation for other speech processing or sequence modeling tasks (e.g., speaker recognition, speech seperation, event detection ...) as the quantum advantages discussed in the paper.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].