All Projects → filippogiruzzi → Voice_activity_detection

filippogiruzzi / Voice_activity_detection

Voice Activity Detection based on Deep Learning & TensorFlow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Voice activity detection

Java Deep Learning Cookbook
Code for Java Deep Learning Cookbook
Stars: ✭ 156 (+18.18%)
Mutual labels:  artificial-intelligence, time-series, deeplearning
Magnet
Deep Learning Projects that Build Themselves
Stars: ✭ 351 (+165.91%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Best ai paper 2020
A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code
Stars: ✭ 2,140 (+1521.21%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Paddlex
PaddlePaddle End-to-End Development Toolkit(『飞桨』深度学习全流程开发工具)
Stars: ✭ 3,399 (+2475%)
Mutual labels:  deep-neural-networks, deeplearning, resnet
Ffdl
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning Platform offering TensorFlow, Caffe, PyTorch etc. as a Service on Kubernetes
Stars: ✭ 640 (+384.85%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Mariana
The Cutest Deep Learning Framework which is also a wonderful Declarative Language
Stars: ✭ 151 (+14.39%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Pytorch Asr
ASR with PyTorch
Stars: ✭ 124 (-6.06%)
Mutual labels:  speech-recognition, resnet, speech
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+1488.64%)
Mutual labels:  deep-neural-networks, speech-recognition, speech
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (+379.55%)
Mutual labels:  deep-neural-networks, speech-recognition, speech
Deeplearning.ai
deeplearning.ai , By Andrew Ng, All video link
Stars: ✭ 625 (+373.48%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Speechbrain.github.io
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
Stars: ✭ 242 (+83.33%)
Mutual labels:  deeplearning, speech-recognition, speech
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+546.97%)
Mutual labels:  artificial-intelligence, deeplearning, resnet
Iresnet
Improved Residual Networks (https://arxiv.org/pdf/2004.04989.pdf)
Stars: ✭ 163 (+23.48%)
Mutual labels:  artificial-intelligence, deep-neural-networks, resnet
Gorgonia
Gorgonia is a library that helps facilitate machine learning in Go.
Stars: ✭ 4,295 (+3153.79%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Bmw Tensorflow Training Gui
This repository allows you to get started with a gui based training a State-of-the-art Deep Learning model with little to no configuration needed! NoCode training with TensorFlow has never been so easy.
Stars: ✭ 736 (+457.58%)
Mutual labels:  deep-neural-networks, deeplearning, resnet
Har Keras Cnn
Human Activity Recognition (HAR) with 1D Convolutional Neural Network in Python and Keras
Stars: ✭ 97 (-26.52%)
Mutual labels:  artificial-intelligence, deep-neural-networks, deeplearning
Deepcpg
Deep neural networks for predicting CpG methylation
Stars: ✭ 113 (-14.39%)
Mutual labels:  deep-neural-networks, deeplearning
Faceswap
Deepfakes Software For All
Stars: ✭ 39,911 (+30135.61%)
Mutual labels:  deep-neural-networks, deeplearning
Holobot
HoloBot is a reusable 3D interface that allows HoloLens & VR users to interact with any bot using Mixed Reality & Speech.
Stars: ✭ 114 (-13.64%)
Mutual labels:  speech-recognition, speech
Tenginekit
TengineKit - Free, Fast, Easy, Real-Time Face Detection & Face Landmarks & Face Attributes & Hand Detection & Hand Landmarks & Body Detection & Body Landmarks & Iris Landmarks & Yolov5 SDK On Mobile.
Stars: ✭ 2,103 (+1493.18%)
Mutual labels:  artificial-intelligence, deep-neural-networks

Voice Activity Detection project

Keywords: Python, TensorFlow, Deep Learning, Time Series classification

Table of contents

  1. Installation
    1.1  Basic installation
    1.2 Virtual environment installation
    1.3 Docker installation
  2. Introduction
    2.1 Goal
    2.2 Results
  3. Project structure
  4. Dataset
  5. Project usage
    5.1 Dataset automatic labeling
    5.2 Record raw data to .tfrecord format
    5.3 Train a CNN to classify Speech & Noise signals
    5.4 Export trained model & run inference on Test set
  6. Todo
  7. Resources

1. Installation

This project was designed for:

  • Ubuntu 20.04
  • Python 3.7.3
  • TensorFlow 1.15.4
$ cd /path/to/project/
$ git clone https://github.com/filippogiruzzi/voice_activity_detection.git
$ cd voice_activity_detection/

1.1 Basic installation

$ pip3 install -r requirements.txt
$ pip3 install -e .

1.2 Virtual environment installation

1.3 Docker installation

Build the docker image:

$ sudo make build

(This might take a while.)

Run the docker image:

$ sudo make local

(Update scrips/docker_local.sh with your personal paths.)

2. Introduction

2.1 Goal

The purpose of this project is to design and implement a real-time Voice Activity Detection algorithm based on Deep Learning.

The designed solution is based on MFCC feature extraction and a 1D-Resnet model that classifies whether a audio signal is speech or noise.

2.2 Results

Model Train acc. Val acc. Test acc.
1D-Resnet 99 % 98 % 97 %

Raw and post-processed inference results on a test audio signal are shown below.

alt text alt text

3. Project structure

The project voice_activity_detection/ has the following structure:

  • vad/data_processing/: raw data labeling, processing, recording & visualization
  • vad/training/: data, input pipeline, model & training / evaluation / prediction
  • vad/inference/: exporting trained model & inference

4. Dataset

Please download the LibriSpeech ASR corpus dataset from https://openslr.org/12/, and extract all files to : /path/to/LibriSpeech/.

The dataset contains approximately 1000 hours of 16kHz read English speech from audiobooks, and is well suited for Voice Activity Detection.

I automatically annotated the test-clean set of the dataset with a pretrained VAD model.

Please feel free to use the labels/ folder and the pre-trained VAD model (only for inference) from this link .

5. Project usage

$ cd /path/to/project/voice_activity_detection/vad/

5.1 Dataset automatic labeling

Skip this subsection if you already have the labels/ folder, that contains annotations from a different pre-trained model.

$ python3 data_processing/librispeech_label_data.py --data_dir /path/to/LibriSpeech/test-clean/
                                                    --exported_model /path/to/pretrained/model/
                                                    --out_dir /path/to/LibriSpeech/labels/

This will record the annotations into /path/to/LibriSpeech/labels/ as .json files.

5.2 Record raw data to .tfrecord format

$ python3 data_processing/data_to_tfrecords.py --data_dir /path/to/LibriSpeech/

This will record the splitted data to .tfrecord format in /path/to/LibriSpeech/tfrecords/

5.3 Train a CNN to classify Speech & Noise signals

$ python3 training/train.py --data-dir /path/to/LibriSpeech/tfrecords/

5.4 Export trained model & run inference on Test set

$ python3 inference/export_model.py --model-dir /path/to/trained/model/dir/
                                    --ckpt /path/to/trained/model/dir/
$ python3 inference/inference.py --data_dir /path/to/LibriSpeech/
                                 --exported_model /path/to/exported/model/
                                 --smoothing

The trained model will be recorded in /path/to/LibriSpeech/tfrecords/models/resnet1d/. The exported model will be recorded inside this directory.

6. Todo

  • [ ] Compare Deep Learning model to a simple baseline
  • [ ] Train on full dataset
  • [ ] Improve data balancing
  • [ ] Add time series data augmentation
  • [ ] Study ROC curve & classification threshold
  • [ ] Add online inference
  • [ ] Evaluate quantitatively post-processing methods on the Test set
  • [ ] Add model description & training graphs
  • [ ] Add Google Colab demo

7. Resources

  • Voice Activity Detection for Voice User Interface, Medium
  • Deep learning for time series classifcation: a review, Fawaz et al., 2018, Arxiv
  • Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline, Wang et al., 2016, Arxiv
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].