All Projects → ksanjeevan → Crnn Audio Classification

ksanjeevan / Crnn Audio Classification

Licence: mit
UrbanSound classification using Convolutional Recurrent Networks in PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Crnn Audio Classification

Automatic speech recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 2,751 (+1070.64%)
Mutual labels:  lstm, rnn, audio
Pytorch Kaldi
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
Stars: ✭ 2,097 (+792.34%)
Mutual labels:  lstm, rnn
Lstm Music Genre Classification
Music genre classification with LSTM Recurrent Neural Nets in Keras & PyTorch
Stars: ✭ 166 (-29.36%)
Mutual labels:  lstm, rnn
Stylenet
A cute multi-layer LSTM that can perform like a human 🎶
Stars: ✭ 187 (-20.43%)
Mutual labels:  lstm, rnn
Natural Language Processing With Tensorflow
Natural Language Processing with TensorFlow, published by Packt
Stars: ✭ 222 (-5.53%)
Mutual labels:  lstm, rnn
Load forecasting
Load forcasting on Delhi area electric power load using ARIMA, RNN, LSTM and GRU models
Stars: ✭ 160 (-31.91%)
Mutual labels:  lstm, rnn
Rnn For Joint Nlu
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling" (https://arxiv.org/abs/1609.01454)
Stars: ✭ 176 (-25.11%)
Mutual labels:  lstm, rnn
Rnnoise
Recurrent neural network for audio noise reduction
Stars: ✭ 2,266 (+864.26%)
Mutual labels:  rnn, audio
Chameleon recsys
Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems
Stars: ✭ 202 (-14.04%)
Mutual labels:  lstm, rnn
Haste
Haste: a fast, simple, and open RNN library
Stars: ✭ 214 (-8.94%)
Mutual labels:  lstm, rnn
Sign Language Gesture Recognition
Sign Language Gesture Recognition From Video Sequences Using RNN And CNN
Stars: ✭ 214 (-8.94%)
Mutual labels:  lstm, rnn
Poetry Seq2seq
Chinese Poetry Generation
Stars: ✭ 159 (-32.34%)
Mutual labels:  lstm, rnn
Tensorflow Bitcoin Robot
A Bitcoin trade robot based on Tensorflow LSTM model.Just for fun.
Stars: ✭ 155 (-34.04%)
Mutual labels:  lstm, rnn
Rnnvis
A visualization tool for understanding and debugging RNNs
Stars: ✭ 162 (-31.06%)
Mutual labels:  lstm, rnn
Audioowl
Fast and simple music and audio analysis using RNN in Python 🕵️‍♀️ 🥁
Stars: ✭ 151 (-35.74%)
Mutual labels:  rnn, audio
Eeg Dl
A Deep Learning library for EEG Tasks (Signals) Classification, based on TensorFlow.
Stars: ✭ 165 (-29.79%)
Mutual labels:  lstm, rnn
Speech Denoiser
A speech denoise lv2 plugin based on RNNoise library
Stars: ✭ 220 (-6.38%)
Mutual labels:  rnn, audio
Rnn poetry generator
基于RNN生成古诗
Stars: ✭ 143 (-39.15%)
Mutual labels:  lstm, rnn
Pytorch Image Comp Rnn
PyTorch implementation of Full Resolution Image Compression with Recurrent Neural Networks
Stars: ✭ 146 (-37.87%)
Mutual labels:  lstm, rnn
Char Rnn Chinese
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch. Based on code of https://github.com/karpathy/char-rnn. Support Chinese and other things.
Stars: ✭ 192 (-18.3%)
Mutual labels:  lstm, rnn

PyTorch Audio Classification: Urban Sounds

Classification of audio with variable length using a CNN + LSTM architecture on the UrbanSound8K dataset.

Example results:

Contents

Dependencies

Features

  • Easily define CRNN in .cfg format
  • Spectrogram computation on GPU
  • Audio data augmentation: Cropping, White Noise, Time Stretching (using phase vocoder on GPU!)

Models

CRNN architecture:

Printing model defined with torchparse:

AudioCRNN(
  (spec): MelspectrogramStretch(num_bands=128, fft_len=2048, norm=spec_whiten, stretch_param=[0.4, 0.4])
  (net): ModuleDict(
    (convs): Sequential(
      (conv2d_0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_0): ELU(alpha=1.0)
      (maxpool2d_0): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
      (dropout_0): Dropout(p=0.1)
      (conv2d_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_1): ELU(alpha=1.0)
      (maxpool2d_1): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_1): Dropout(p=0.1)
      (conv2d_2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=[0, 0])
      (batchnorm2d_2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (elu_2): ELU(alpha=1.0)
      (maxpool2d_2): MaxPool2d(kernel_size=4, stride=4, padding=0, dilation=1, ceil_mode=False)
      (dropout_2): Dropout(p=0.1)
    )
    (recur): LSTM(128, 64, num_layers=2)
    (dense): Sequential(
      (dropout_3): Dropout(p=0.3)
      (batchnorm1d_0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (linear_0): Linear(in_features=64, out_features=10, bias=True)
    )
  )
)
Trainable parameters: 139786

Usage

Inference

Run inference on an audio file:

./run.py /path/to/audio/file.wav -r path/to/saved/model.pth 

Training

./run.py train -c config.json --cfg arch.cfg
Augmentation

Dataset transforms:

Compose(
    ProcessChannels(mode=avg)
    AdditiveNoise(prob=0.3, sig=0.001, dist_type=normal)
    RandomCropLength(prob=0.4, sig=0.25, dist_type=half)
    ToTensorAudio()
)

As well as time stretching:

TensorboardX

Evaluation

./run.py eval -r /path/to/saved/model.pth

Then obtain defined metrics:

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 34/34 [00:03<00:00, 12.68it/s]
{'avg_precision': '0.725', 'avg_recall': '0.719', 'accuracy': '0.804'}
10-Fold Cross Validation
Arch Accuracy AvgPrecision(macro) AvgRecall(macro)
CNN 71.0% 63.4% 63.5%
CRNN 72.3% 64.3% 65.0%
CRNN(Bidirectional, Dropout) 73.5% 65.5% 65.8%
CRNN(Dropout) 73.0% 65.5% 65.7%
CRNN(Bidirectional) 72.8% 64.3% 65.2%

Per fold metrics CRNN(Bidirectional, Dropout):

Fold Accuracy AvgPrecision(macro) AvgRecall(macro)
1 73.1% 65.1% 66.1%
2 80.7% 69.2% 68.9%
3 62.8% 57.3% 57.5%
4 73.6% 65.2% 64.9%
5 78.4% 70.3% 71.5%
6 73.5% 65.5% 65.9%
7 74.6% 67.0% 66.6%
8 66.7% 62.3% 61.7%
9 71.7% 60.7% 62.7%
10 79.9% 72.2% 71.8%

To Do

  • [ ] commit jupyter notebook dataset exploration
  • [x] Switch overt to using pytorch/audio
  • [x] use torchaudio-contrib for STFT transforms
  • [x] CRNN entirely defined in .cfg
  • [x] Some bug in 'infer'
  • [x] Run 10-fold Cross Validation
  • [x] Switch over to pytorch/audio since the merge
  • [ ] Comment things
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].