All Projects → f90 → AdversarialAudioSeparation

f90 / AdversarialAudioSeparation

Licence: MIT license
Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to AdversarialAudioSeparation

All About The Gan
All About the GANs(Generative Adversarial Networks) - Summarized lists for GAN
Stars: ✭ 630 (+800%)
Mutual labels:  paper, adversarial-networks
Adversarial Semisupervised Semantic Segmentation
Pytorch Implementation of "Adversarial Learning For Semi-Supervised Semantic Segmentation" for ICLR 2018 Reproducibility Challenge
Stars: ✭ 147 (+110%)
Mutual labels:  semi-supervised-learning, adversarial-networks
Sparsely Grouped Gan
Code for paper "Sparsely Grouped Multi-task Generative Adversarial Networks for Facial Attribute Manipulation"
Stars: ✭ 68 (-2.86%)
Mutual labels:  paper, semi-supervised-learning
Adversarial Autoencoders
Tensorflow implementation of Adversarial Autoencoders
Stars: ✭ 215 (+207.14%)
Mutual labels:  semi-supervised-learning, adversarial-networks
Adversarial-Semisupervised-Semantic-Segmentation
Pytorch Implementation of "Adversarial Learning For Semi-Supervised Semantic Segmentation" for ICLR 2018 Reproducibility Challenge
Stars: ✭ 151 (+115.71%)
Mutual labels:  semi-supervised-learning, adversarial-networks
ghiaseddin
Author's implementation of the paper "Deep Relative Attributes" (ACCV 2016)
Stars: ✭ 41 (-41.43%)
Mutual labels:  paper
LMMS
Language Modelling Makes Sense - WSD (and more) with Contextual Embeddings
Stars: ✭ 79 (+12.86%)
Mutual labels:  paper
sdn-nfv-papers
This is a paper list about Resource Allocation in Network Functions Virtualization (NFV) and Software-Defined Networking (SDN).
Stars: ✭ 40 (-42.86%)
Mutual labels:  paper
php-quill-renderer
Render quill insert deltas to HTML, Markdown and GitHub flavoured Markdown
Stars: ✭ 117 (+67.14%)
Mutual labels:  mit-license
semi-supervised-NFs
Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Stars: ✭ 23 (-67.14%)
Mutual labels:  semi-supervised-learning
geeky-hugo
Geeky is a Personal Hugo blog theme focused on high speed. Geeky is fully responsive, Superfast, and powered by Bootstrap v5.
Stars: ✭ 44 (-37.14%)
Mutual labels:  mit-license
KeyboardDelimiter
jQuery Plugin for delimite pressed key on keyboard
Stars: ✭ 14 (-80%)
Mutual labels:  mit-license
SemiSeg-AEL
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)
Stars: ✭ 79 (+12.86%)
Mutual labels:  semi-supervised-learning
audioContextEncoder
A context encoder for audio inpainting
Stars: ✭ 18 (-74.29%)
Mutual labels:  paper
Awesome-Lane-Detection
A paper list with code of lane detection.
Stars: ✭ 34 (-51.43%)
Mutual labels:  paper
node-epicgames-fortnite-client
Unofficial javascript client for Fortnite.
Stars: ✭ 55 (-21.43%)
Mutual labels:  mit-license
ganbert-pytorch
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
Stars: ✭ 60 (-14.29%)
Mutual labels:  semi-supervised-learning
hacktoberfest2021
For Beginners, students and developers this is a great opportunity to learn and contribute to open source.
Stars: ✭ 79 (+12.86%)
Mutual labels:  mit-license
Leafgem
🌿💎 The humble beginnings of a 2D game engine in Crystal! [in-progress]
Stars: ✭ 72 (+2.86%)
Mutual labels:  mit-license
TiDB-A-Raft-based-HTAP-Database
Unofficial! English original and Chinese translation of the paper.
Stars: ✭ 42 (-40%)
Mutual labels:  paper

AdversarialAudioSeparation

Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction" available on arXiv here:

https://arxiv.org/abs/1711.00048

The idea

Improve existing supervised audio source separation models, which are commonly neural networks, with extra unlabelled mixture recordings as well as unlabelled solo recordings of the sources we want to separate. The network is trained in a normal supervised fashion to minimise its prediction error on fully annotated data (samples with mixture and sources paired up), and at the same time to output source estimates for the extra mixture recordings that are indistinguishable from the solo source recordings.

To achieve this, we use adversarial training: One discriminator network is trained per source to identify whether a source excerpt comes from the real solo source recordings or from the separator when evaluated on the extra mixtures.

This can prevent overfitting to the often small annotated dataset and makes use of the much more easily available unlabelled data.

Setup

Requirements

To run the code, the following Python packages are needed. We recommend the GPU version for Tensorflow due to the long running times of this model. You can install them easily using pip install -r requirements.txt after saving the below list to a text file.

tensorflow-gpu>=1.2.0  
sacred>=0.7.0  
audioread>=2.1.5
imageio>=2.2.0
librosa>=0.5.1
lxml>=3.8.0
mir_eval>=0.4
scikits.audiolab>=0.11.0
soundfile>=0.9.0

Furthermore, ffmpeg needs to be installed and in your path in case you want to read in mp3 files directly.

Dataset preparation

Before the code is runnable, the datasets need to be prepared and integrated into the data loading process. The simpler way to do this is to use the same datasets as used in the experiment in the paper, the alternative to use your own datasets and split them into custom partitions. Please see below and the Training.py code comments for guidance.

When the code is run for the first time, it creates a dataset.pkl file containing the dataset structure after reading in the dataset, so that subsequent starts are much faster.

Option 1: Recreate experiment from the paper

If you want to recreate the experiment from the paper, download the datasets DSD100, MedleyDB, CCMixter, and iKala separately. Then edit the corresponding XML files provided in this repository (DSD100.xml etc.), so that the XML entry

<databaseFolderPath>/mnt/daten/Datasets/DSD100</databaseFolderPath>

contains the location of the root folder of the respective dataset. Save the file changes and then execute Training.py.

Option 2: Use your own data of choice

To use your own datasets and dataset partitioning into supervised, unsupervised, validation and test sets, you can replace the data loading code in Training.py with a custom dataset loading function.

The only requirement to this function is its output format. The output should be a dictionary that maps the following strings to the respective dataset partitions:

"train_sup" : sample_list
"train_unsup" : [mix_list, source1_list, source2_list]
"train_valid" : sample_list
"train_test" : sample_list

A sample_list is a list with each element being a tuple containing three Sample objects. The order for these objects is mixture, source 1, source 2. You can initialise Sample objects with the constructor of the Sample class found in Sample.py. Each represents an audio signal along with its metadata. This audio should be preferably in .wav format for fast on-the-fly reading, but other formats such as mp3 are also supported.

The entry for "train_unsup" is different since recordings are not paired - instead, this entry is a list containing three lists. These contain mixtures, source1 and source2 Sample objects respectively. The lists can be of different length. since they are not paired.

Configuration and hyperparameters

You can configure settings and hyperparameters by modifying the model_config dictionary defined in the beginning of Training.py or using the commandline features of sacred by setting certain values when calling the script via commandline (see Sacred documentation).

Note that alpha and beta (hyperparameters from the paper) as loss weighting parameters are relatively important for good performance, tweaking these might be necessary. These are also editable in the model_config dictionary.

Training

The code is run by executing

python Training.py

It will train the same separator network first in a purely supervised way, and then using our semi-supervised adversarial approach. Each time, validation performance is measured regularly and early stopping is used, before the final test set performance is evaluated. For the semi-supervised approach, the additional data from dataset["train_unsup"] is used to improve performance.

Finally, BSS evaluation metrics are computed on the test dataset (SDR, SIR, SAR) - this saves the results in a pickled file along with the name of the dataset, so if you aim to use different datasets, the function needs to be extended slightly.

Logs are written continuously to the logs subfolder, so training can be supervised with Tensorboard. Checkpoint files of the model are created whenever validation performance is tested.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].