All Projects → sainathadapa → dcase2019-task5-urban-sound-tagging

sainathadapa / dcase2019-task5-urban-sound-tagging

Licence: MIT license
1st place solution to the DCASE 2019 - Task 5 - Urban Sound Tagging

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to dcase2019-task5-urban-sound-tagging

psla
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
Stars: ✭ 85 (+203.57%)
Mutual labels:  audio-classification
DCASE2017-baseline-system
DCASE 2017 Baseline system
Stars: ✭ 76 (+171.43%)
Mutual labels:  dcase
Tensorflow-Audio-Classification
Audio classification with VGGish as feature extractor in TensorFlow
Stars: ✭ 105 (+275%)
Mutual labels:  audio-classification
Audio-Scene-Classification
Scene Classification using Audio in the nearby Environment.
Stars: ✭ 18 (-35.71%)
Mutual labels:  audio-classification
DCASE-models
Python library for rapid prototyping of environmental sound analysis systems
Stars: ✭ 35 (+25%)
Mutual labels:  audio-classification
CityNet
A neural network classifier for urban soundscapes
Stars: ✭ 21 (-25%)
Mutual labels:  audio-classification
ESC-CNN-microcontroller
Environmental Sound Classification on Microcontrollers using Convolutional Neural Networks
Stars: ✭ 85 (+203.57%)
Mutual labels:  audio-classification
spoken-command-recognition
A large, free audio sample database (10M words pronounced), a test bed for voice activity detection algorithms and for single-syllable word recognition
Stars: ✭ 59 (+110.71%)
Mutual labels:  audio-classification
DCASE2016-baseline-system-python
DCASE 2016 Baseline system, python implementation
Stars: ✭ 51 (+82.14%)
Mutual labels:  dcase
mxnet-audio
Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet
Stars: ✭ 42 (+50%)
Mutual labels:  audio-classification
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
Stars: ✭ 36 (+28.57%)
Mutual labels:  audio-classification
AudioClassification-PaddlePaddle
基于PaddlePaddle实现的音频分类,博客地址:
Stars: ✭ 32 (+14.29%)
Mutual labels:  audio-classification
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (+21.43%)
Mutual labels:  audio-classification
audio-classification
Audio Classification - Multilayer Neural Networks using TensorFlow
Stars: ✭ 28 (+0%)
Mutual labels:  audio-classification
MAX-Audio-Classifier
Identify sounds in short audio clips
Stars: ✭ 115 (+310.71%)
Mutual labels:  audio-classification
label-studio-frontend
Data labeling react app that is backend agnostic and can be embedded into your applications — distributed as an NPM package
Stars: ✭ 230 (+721.43%)
Mutual labels:  audio-classification
Audio Classification using LSTM
Classification of Urban Sound Audio Dataset using LSTM-based model.
Stars: ✭ 47 (+67.86%)
Mutual labels:  audio-classification
icassp2019-tutorial
ICASSP2019 Tutorial: Detection and Classification of Acoustic Scenes and Events / Code examples
Stars: ✭ 34 (+21.43%)
Mutual labels:  dcase

DCASE 2019 - Task 5 - Urban Sound Tagging

This repository contains the final solution that I used for the DCASE 2019 - Task 5 - Urban Sound Tagging. The model achieved 1st position in prediction of both Coarse and Fine-level labels.

Reproducing the results

Prerequisites:

  • Linux based system
  • Python >= 3.5
  • NVidia GFX card with at least 8GB memory
  • Cuda >= 10.0
  • virtualenv package installed

Replicating:

Clone this repository. For a single command to replicate the entire solution, execute make run_all command while being in the repository directory. This command does the following steps sequentially:

  • make env: Creates a virtual environment in the current directory
  • make reqs: Installs python packages
  • make pytorch: Installs PyTorch
  • make download: Downloads the Task 5's data from Zenodo
  • make extract: Extracts the zipped files
  • make parse: Parses annotations
  • make logmel: Computes and saves Log-Mel spectrograms for all the files
  • make train_s1: Trains (system 1) model
  • make eval_s1: Conducts local evaluation of the trained model (system 1)
  • make submit_s1: Generates the submission file (system 1)
  • make train_s2: Trains (system 2) model
  • make eval_s2: Conducts local evaluation of the trained model (system 2)
  • make submit_s2: Generates the submission file (system 2)

Artifacts

The weights for both the models are available in the releases page.

About the solution

The technical report can read here, and the workshop paper is available on the DCASE proceedings page.

License

Unless otherwise stated, the contents of this repository are shared under the MIT License.

Citing

@inproceedings{Adapa2019,
    author = "Adapa, Sainath",
    title = "Urban Sound Tagging using Convolutional Neural Networks",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)",
    address = "New York University, NY, USA",
    month = "October",
    year = "2019",
    pages = "5--9",
    abstract = "In this paper, we propose a framework for environmental sound classification in a low-data context (less than 100 labeled examples per class). We show that using pre-trained image classification models along with usage of data augmentation techniques results in higher performance over alternative approaches. We applied this system to the task of Urban Sound Tagging, part of the DCASE 2019. The objective was to label different sources of noise from raw audio data. A modified form of MobileNetV2, a convolutional neural network (CNN) model was trained to classify both coarse and fine tags jointly. The proposed model uses log-scaled Mel-spectrogram as the representation format for the audio data. Mixup, Random erasing, scaling, and shifting are used as data augmentation techniques. A second model that uses scaled labels was built to account for human errors in the annotations. The proposed model achieved the first rank on the leaderboard with Micro-AUPRC values of 0.751 and 0.860 on fine and coarse tags, respectively."
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].