All Projects → yinkalario → EIN-SELD

yinkalario / EIN-SELD

Licence: other
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to EIN-SELD

contiki-uwb
Contiki OS, Glossy and Crystal port for the DecaWave EVB1000 and DWM1001 platforms featuring the DW1000 UWB transceiver
Stars: ✭ 22 (-55.1%)
Mutual labels:  localization
awesome-translations
😎 Awesome lists about Internationalization & localization stuff. l10n, g11n, m17n, i18n. Translations! 🌎🌍
Stars: ✭ 54 (+10.2%)
Mutual labels:  localization
laravel-translate
Generate translation files for Laravel using Google Translate
Stars: ✭ 22 (-55.1%)
Mutual labels:  localization
ros-vrep-slam
ROS and V-REP for Robot Mapping and Localization
Stars: ✭ 39 (-20.41%)
Mutual labels:  localization
lingua
A PHP-7 language codes converter, from and to the most common formats (ISO or not)
Stars: ✭ 35 (-28.57%)
Mutual labels:  localization
ReaperJPN-Phroneris
製品版REAPER日本語化パッチ(森)
Stars: ✭ 41 (-16.33%)
Mutual labels:  localization
www.mozilla.org
Localization of www.mozilla.org
Stars: ✭ 62 (+26.53%)
Mutual labels:  localization
stone.js
gettext-like client-side Javascript Internationalization Library
Stars: ✭ 20 (-59.18%)
Mutual labels:  localization
GA SLAM
🚀 SLAM for autonomous planetary rovers with global localization
Stars: ✭ 40 (-18.37%)
Mutual labels:  localization
codac
Codac is a library for constraint programming over reals, trajectories and sets.
Stars: ✭ 31 (-36.73%)
Mutual labels:  localization
go-localize
i18n (Internationalization and localization) engine written in Go, used for translating locale strings.
Stars: ✭ 45 (-8.16%)
Mutual labels:  localization
Domino-English-Translation
🌏 Let's translate Domino, a Japanese MIDI editor!
Stars: ✭ 29 (-40.82%)
Mutual labels:  localization
Lingo-Vapor
Vapor provider for Lingo - the Swift localization library
Stars: ✭ 45 (-8.16%)
Mutual labels:  localization
DeepI2P
DeepI2P: Image-to-Point Cloud Registration via Deep Classification. CVPR 2021
Stars: ✭ 130 (+165.31%)
Mutual labels:  localization
HEAPUtil
Code for the RA-L (IROS) 2021 paper "A Hierarchical Dual Model of Environment- and Place-Specific Utility for Visual Place Recognition"
Stars: ✭ 46 (-6.12%)
Mutual labels:  localization
I18N
I18N Library for .NET, and Delphi
Stars: ✭ 48 (-2.04%)
Mutual labels:  localization
dart.cn
Dart docs localization, get started from the wiki page here: https://github.com/cfug/dart.cn/wiki
Stars: ✭ 64 (+30.61%)
Mutual labels:  localization
laravel-localization-route-cache
Translated Route Caching Solution for Laravel Localization
Stars: ✭ 49 (+0%)
Mutual labels:  localization
SOLocalization
Configure multi-language environment in iOS application
Stars: ✭ 13 (-73.47%)
Mutual labels:  localization
french
French language pack to localize the Flarum forum software plus its official and third-party extensions.
Stars: ✭ 17 (-65.31%)
Mutual labels:  localization

An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection

An Improved Event-Independent Network (EIN) for Polyphonic Sound Event Localization and Detection (SELD)

from Centre for Vision, Speech and Signal Processing, University of Surrey.

Contents

Introduction

This is a Pytorch implementation of Event-Independent Networks for Polyphonic SELD.

Event-Independent Networks for Polyphonic SELD uses a trackwise output format and multi-task learning (MTL) of a soft parameter-sharing scheme. For more information, please read papers here.

The features of this method are:

  • It uses a trackwise output format to detect different sound events of the same type but with different DoAs.
  • It uses a permutation-invaiant training (PIT) to solve the track permutation problem introducted by trackwise output format.
  • It uses multi-head self-attention (MHSA) to separate tracks.
  • It uses multi-task learning (MTL) of a soft parameter-sharing scheme for joint-SELD.

Currently, the code is availabel for TAU-NIGENS Spatial Sound Events 2020 dataset. Data augmentation methods are not included.

Requirements

We provide two ways to setup the environment. Both are based on Anaconda.

  1. Use the provided prepare_env.sh. Note that you need to set the anaconda_dir in prepare_env.sh to your anaconda directory, then directly run

    bash scripts/prepare_env.sh
  2. Use the provided environment.yml. Note that you also need to set the prefix to your aimed env directory, then directly run

    conda env create -f environment.yml

After setup your environment, don't forget to activate it

conda activate ein

Download Dataset

Download dataset is easy. Directly run

bash scripts/download_dataset.sh

Preprocessing

It is needed to preprocess the data and meta files. .wav files will be saved to .h5 files. Meta files will also be converted to .h5 files. After downloading the data, directly run

bash scripts/preproc.sh

Preprocessing for meta files (labels) separate labels to different tracks, each with up to one event and a corresponding DoA. The same event is consistently put in the same track. For frame-level permutation-invariant training, this may not be necessary, but for chunk-level PIT or no PIT, consistently arrange the same event in the same track is reasonable.

QuickEvaluate

We uploaded the pre-trained model here. Download it and unzip it in the code folder (EIN-SELD folder) using

wget 'https://zenodo.org/record/4158864/files/out_train.zip' && unzip out_train.zip

Then directly run

bash scripts/predict.sh && sh scripts/evaluate.sh

Usage

Hyper-parameters are stored in ./configs/ein_seld/seld.yaml. You can change some of them, such as train_chunklen_sec, train_hoplen_sec, test_chunklen_sec, test_hoplen_sec, batch_size, lr and others.

Training

To train a model yourself, setup ./configs/ein_seld/seld.yaml and directly run

bash scripts/train.sh

train_fold and valid_fold in ./configs/ein_seld/seld.yaml means using what folds to train and validate. Note that valid_fold can be None which means no validation is needed, and this is usually used for training using fold 1-6.

overlap can be 1 or 2 or combined 1&2, which means using non-overlapped sound event to train or overlapped to train or both.

--seed is set to a random integer by default. You can set it to a fixed number. Results will not be completely the same if RNN or Transformer is used.

You can consider to add --read_into_mem argument in train.sh to pre-load all of the data into memory to increase the training speed, according to your resources.

--num_workers also affects the training speed, adjust it according to your resources.

Prediction

Prediction predicts resutls and save to ./out_infer folder. The saved results is the submission result for DCASE challenge. Directly run

bash scripts/predict.sh

Prediction predicts results on testset_type set, which can be dev or eval. If it is dev, test_fold cannot be None.

Evaluation

Evaluation evaluate the generated submission result. Directly run

bash scripts/evaluate.sh

Results

It is notable that EINV2-DA is a single model with plain VGGish architecture using only the channel-rotation and the specaug data-augmentation methods.

FAQs

  1. If you have any question, please email to [email protected] or report an issue here.

  2. Currently the pin_memory can only be set to True. For more information, please check Pytorch Doc and Nvidia Developer Blog.

  3. After downloading, you can delete downloaded_packages folder to save some space.

Citing

If you use the code, please consider citing the papers below

[1]. Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, Mark D. Plumbley, "An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection", submitted for publication

@article{cao2020anevent,
  title={An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection},
  author={Cao, Yin and Iqbal, Turab and Kong, Qiuqiang and Fengyan, An and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:2010.13092},
  year={2020}
}

[2]. Yin Cao, Turab Iqbal, Qiuqiang Kong, Yue Zhong, Wenwu Wang, Mark D. Plumbley, "Event-Independent Network for Polyphonic Sound Event Localization and Detection", DCASE 2020 Workshop, November 2020

@article{cao2020event,
  title={Event-Independent Network for Polyphonic Sound Event Localization and Detection},
  author={Cao, Yin and Iqbal, Turab and Kong, Qiuqiang and Zhong, Yue and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:2010.00140},
  year={2020}
}

Reference

  1. Archontis Politis, Sharath Adavanne, and Tuomas Virtanen. A dataset of reverberant spatial sound scenes with moving sources for sound event localization and detection. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE2020). November 2020. URL

  2. Annamaria Mesaros, Sharath Adavanne, Archontis Politis, Toni Heittola, and Tuomas Virtanen. Joint measurement of localization and detection of sound events. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, NY, Oct 2019. URL

  3. Sharath Adavanne, Archontis Politis, Joonas Nikunen, and Tuomas Virtanen. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing, 13(1):34–48, March 2018. URL

  4. https://github.com/yinkalario/DCASE2019-TASK3

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].