All Projects → gabolsgabs → cunet

gabolsgabs / cunet

Licence: other
Control mechanisms to the U-Net architecture for doing multiple source separation instruments

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to cunet

ismir2019-music-style-translation
The code for the ISMIR 2019 paper “Supervised symbolic music style translation using synthetic data”.
Stars: ✭ 27 (-25%)
Mutual labels:  music-information-retrieval, ismir
dechorder
Automatic chord recognition application powered by machine learning
Stars: ✭ 42 (+16.67%)
Mutual labels:  music-information-retrieval
arranger
An AI for Automatic Instrumentation
Stars: ✭ 37 (+2.78%)
Mutual labels:  music-information-retrieval
Music-Genre-Classification
Genre Classification using Convolutional Neural Networks
Stars: ✭ 27 (-25%)
Mutual labels:  music-information-retrieval
vamp-aubio-plugins
aubio plugins for Vamp
Stars: ✭ 38 (+5.56%)
Mutual labels:  music-information-retrieval
SymbTr
Turkish Makam Music Symbolic Data Collection
Stars: ✭ 55 (+52.78%)
Mutual labels:  music-information-retrieval
essentia-tutorial
A tutorial for using Essentia in Python
Stars: ✭ 16 (-55.56%)
Mutual labels:  music-information-retrieval
TasNet
A PyTorch implementation of Time-domain Audio Separation Network (TasNet) with Permutation Invariant Training (PIT) for speech separation.
Stars: ✭ 81 (+125%)
Mutual labels:  source-separation
sampleCNN-pytorch
Pytorch implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"
Stars: ✭ 45 (+25%)
Mutual labels:  music-information-retrieval
MidiTok
A convenient MIDI / symbolic music tokenizer for Deep Learning networks, with multiple strategies 🎶
Stars: ✭ 180 (+400%)
Mutual labels:  music-information-retrieval
rl singing voice
Unsupervised Representation Learning for Singing Voice Separation
Stars: ✭ 18 (-50%)
Mutual labels:  source-separation
nowplaying-RS-Music-Reco-FM
#nowplaying-RS: Music Recommendation using Factorization Machines
Stars: ✭ 23 (-36.11%)
Mutual labels:  music-information-retrieval
DeepSeparation
Keras Implementation and Experiments with Deep Recurrent Neural Networks for Source Separation
Stars: ✭ 19 (-47.22%)
Mutual labels:  source-separation
audio source separation
An implementation of audio source separation tools.
Stars: ✭ 41 (+13.89%)
Mutual labels:  source-separation
MixingBear
Package for automatic beat-mixing of music files in Python 🐻🎚
Stars: ✭ 73 (+102.78%)
Mutual labels:  music-information-retrieval
audio to midi
A CNN which converts piano audio to a simplified MIDI format
Stars: ✭ 29 (-19.44%)
Mutual labels:  music-information-retrieval
speaker extraction
target speaker extraction and verification for multi-talker speech
Stars: ✭ 85 (+136.11%)
Mutual labels:  source-separation
mtg-jamendo-dataset
Metadata, scripts and baselines for the MTG-Jamendo dataset
Stars: ✭ 140 (+288.89%)
Mutual labels:  music-information-retrieval
midi degradation toolkit
A toolkit for generating datasets of midi files which have been degraded to be 'un-musical'.
Stars: ✭ 29 (-19.44%)
Mutual labels:  ismir
AMSS-Net
A PyTorch implementation of the paper: "AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries" (ACM Multimedia 2021)
Stars: ✭ 19 (-47.22%)
Mutual labels:  source-separation

Conditioned-U-Net for multitask musical instrument source separations

How to use this package:

Three different modules. Each one perform a different step of the process. Look the different README to learn how to use it.

└── cunet
    ├── __init__.py
    ├── evaluation
    ├── preprocess
    └── train
  • Prepare the input data at: code/preprocess
└── preprocess
    ├── README.md
    ├── __init__.py
    ├── config.py
    ├── indexes.py
    └── spectrogram.py
  • Train a new model at: code/train
└── train
    ├── README.md
    ├── __init__.py
    ├── config.py
    ├── data_loader.py
    ├── main.py
    ├── models
    │   ├── FiLM_utils.py
    │   ├── __init__.py
    │   ├── control_models.py
    │   ├── cunet_model.py
    │   └── unet_model.py
    └── others
        ├── __init__.py
        ├── lock.py
        ├── utilities.py
        └── val_files.py
  • Evaluate a model at: code/evaluation

    └── evaluation
        ├── README.md
        ├── __init__.py
        ├── config.py
        └── results.py
    

Overview:

In this work we introduce a control mechanisms to the U-Net architecture. This control mechanism permits multiple instrument source separations with a single model without losing any performance.

It is motivated by the idea that we can process the same input differently depending of some external context.

We have two main elements:

  • Generic model: not specialized in any particular task, but rather in finding a set of generic source separation tools.
  • Control: that defines how to combine the set of tools.

In our system, we just need to specify which instrument (or combination of instruments) we want to isolate. Then the control will tell to the generic model what to do in order to isolate the desired instrument(s). The key question is how to efficiently control the generic model. We do that using FiLM layers.

FiLM layers:

FiLM permits to modulate any neural network architecture inserting one or several FiLM layers at any depth of the original model [1]:

x is the input of the FiLM layer and gamma and beta the learnable parameters that scale and shift x based on an external information, z.

The original FiLM computes a different operation per features map, and we propose a new layer that performs the same affine operation to the whole input.

Our architecture:

Overview of the system.

Generic model

The generic model is a U-Net[2] and it has to main part:

  • The decoder codifies and highlights the relevant information to separate a particular instrument creating a latent space.
  • The encoder transforms the latent space back to audio signal.

It is important to decide where to condition. And we think that it is essential to be able to create different latent spaces per instrument. Therefore, we condition only the decoder. FiLM layers are applied at each encoder block after batch normalization.

Condition generator

The condition generator takes as input a vector with the desired instrument to separate and compute the gammas and betas that control the generic model. We test two different configurations based on fully-connected architecture and convolutional one.

Both systems are trained jointly.

You can find a detailed explanation at: [Meseguer-Brocal_2018] G. Meseguer-Brocal and G. Peeters. CONDITIONED-U-NET: Introducing a Control Mechanism in the U-net For Multiple Source Separations. or a new version with small corrections at arxiv.

Cite this paper:

@inproceedings{Meseguer-Brocal_2019, Author = {Meseguer-Brocal, Gabriel and Peeters, Geoffroy}, Booktitle = {20th International Society for Music Information Retrieval Conference}, Editor = {ISMIR}, Month = {November}, Title = {CONDITIONED-U-NET: Introducing a Control Mechanism in the U-net For Multiple Source Separations.}, Year = {2019}}

You can contact us at:

gabriel dot meseguerbrocal at ircam dot fr

References:

[1] E. Perez, F. Strub, H. de Vries, V. Dumoulin, and A. C. Courville. Film: Visual reasoning with a general condition- ing layer. In Proc. of AAAI (Conference on Artificial Intelligence), New Orleans, LA, USA, 2018.

[2] A.Jansson, N.Montecchio, R.Bittner, A.Kumar, T.Weyde, E. J. Humphrey. Singing voice separation with deep u-net convolutional networks. In Proc. of ISMIR (International Society for Music Information Retrieval), Suzhou, China, 2017.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].