Hierarchical Timbre-Painting and Articulation Generation

This repository provides an official PyTorch implementation of "Hierarchical Timbre-Painting and Articulation Generation"

Our method generates high-fidelity audio for a target instrument, based f0 and loudness signal.

During training, loudness and f0 signal are extracted from ground-truth signal, which enables us to convert the melody of any input instrument to the trained instrument - task also known as Timbre Transfer

Audio Samples | Paper | Pretrained Models | Timbre Transfer Colab Demo

We suggest separating the generation process into two consecutive phases:

Articulation - We generate the backbone of the audio and the transition between notes. This is done on a low sample rate from the given condition, loudness and f0 inputs. We use a sine excitation based on the extracted f0 signal, hence using the generator as a Neural-Source-Filtering network rather than a classic GAN generator which is condition on random noise.
Timbre Painting - The next phase is composed of timbre painting networks: each network gets as input the previously generated audio and serves as a learnable upsample network. Each timbre-painting networks adds sample-rate specific details to the audio clip.

Dependencies

The needed packages are given in requirements.txt

Using a virtual enviroment is recommended:

virtualenv -p python3 .venv
source .venv/bin/activate
pip install -r requirements.txt

To use distributed runs, please install apex

Usage

Hydra is used for configuration and experiments mangement, for more info refer https://hydra.cc/

1. Cloning the repository

$ git clone https://github.com/mosheman5/timbre_painting.git
$ cd timbre_painting

2. Data Preparation

URMP Dataset

To download the URMP dataset used in our paper please fill the form

After download extract the content of the file to a folder named urmp and run the following script to preprocess the data:

python create_data.py

Other datasets

To train the model on any other datasets of monophonic instruments, copy the audio files to data_tmp directory, each instrument in a different folder, and run:

python create_data.py urmp=null

Default parameters are given at conf/data_config.yaml, overrides should be given in command line.

Please note the default parameters are defined for URMP dataset, for other datasets tuning might be needed (especially the data_processor.params.confidence_threshold and data_processor.params.silence_thresh_dB parameters)

3. Training

3.1 Single GPU

To Train with the original paper's parameters run:

python main.py

Default parameters are given at conf/runs/main.yaml, overrides should be given in command line.

for example, the following line runs an experiment on a dataset folder named 'flute' for 400 epochs and batch_size of 4:

python main.py paths.input_data=data.flute optim.epochs=400 optim.batch_size=4

results are saved in the folder outputs/main/${%Y-%m-%d_%H-%M-%S}

3.2 Mulriple GPUs / Machines

DDP is supported in the code by Apex package. To run in distributed mode, use the following template:

python -m torch.distributed.launch --use_env --nproc_per_node {# of gpus} main.py {argument overrides}

It's possible to use CUDA_VISIBLE_DEVICES=0,1 to choose the gpus to run on, in this example gpus 0,1 on the machine.

4. Timbre Transfer

To transfer the timbre of your files using a trained network, run:

python timbre_painting.py trained_dirpath={path/to/trained_model} input_dirpath={path/to/audio_sample_folder}

Default parameters are given at conf/transfer_config.yaml.

The generated files are saved in the experiment folder, in subdirectory generation Each input is generated in 5 versions with varying octave range from [-2, 2]

Pretrained Models

Pretrained models of instruments from URMP dataset are summarazied in the table. The models can be downloaded from the google drive links attached. Download the model, extract and follow timbre transfer to generate audio.

Instrument
Violin
Saxophone
Trumpet
Cello

Citation

If you found this code useful, please cite the following paper:

@inproceedings{michelashvili2020timbre-painting,
  title={Hirearchical Timbre-Painting and Articulation Generation},
  author={Michael Michelashvili and Lior Wolf},
  journal={21st International Society for Music Information Retrieval (ISMIR2020)},
  year={2020}
}

Code References

Acknowledgement

Credit to Adam Polyak for PyTorch CREPE pitch-extraction implementation and helpful discussions.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

mosheman5 / timbre_painting

Programming Languages

Labels

Projects that are alternatives of or similar to timbre painting