All Projects → ilaria-manco → muscaps

ilaria-manco / muscaps

Licence: GPL-3.0 License
Source code for "MusCaps: Generating Captions for Music Audio" (IJCNN 2021)

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to muscaps

MixingBear
Package for automatic beat-mixing of music files in Python 🐻🎚
Stars: ✭ 73 (+87.18%)
Mutual labels:  music-information-retrieval, mir
emusic net
Neural network to classify certain styles of Electronic music
Stars: ✭ 22 (-43.59%)
Mutual labels:  music-information-retrieval, mir
referit3d
Code accompanying our ECCV-2020 paper on 3D Neural Listeners.
Stars: ✭ 59 (+51.28%)
Mutual labels:  multimodal-deep-learning
rust-lock-bug-detector
Statically detect double-lock & conflicting-lock bugs on MIR
Stars: ✭ 39 (+0%)
Mutual labels:  mir
MISE
Multimodal Image Synthesis and Editing: A Survey
Stars: ✭ 214 (+448.72%)
Mutual labels:  multimodal-deep-learning
hateful memes-hate detectron
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975
Stars: ✭ 35 (-10.26%)
Mutual labels:  multimodal-deep-learning
dechorder
Automatic chord recognition application powered by machine learning
Stars: ✭ 42 (+7.69%)
Mutual labels:  music-information-retrieval
nowplaying-RS-Music-Reco-FM
#nowplaying-RS: Music Recommendation using Factorization Machines
Stars: ✭ 23 (-41.03%)
Mutual labels:  music-information-retrieval
BasicsMusicalInstrumClassifi
Basics of Musical Instruments Classification using Machine Learning
Stars: ✭ 27 (-30.77%)
Mutual labels:  music-information-retrieval
SymbTr
Turkish Makam Music Symbolic Data Collection
Stars: ✭ 55 (+41.03%)
Mutual labels:  music-information-retrieval
Robust-Deep-Learning-Pipeline
Deep Convolutional Bidirectional LSTM for Complex Activity Recognition with Missing Data. Human Activity Recognition Challenge. Springer SIST (2020)
Stars: ✭ 20 (-48.72%)
Mutual labels:  multimodal-deep-learning
mtg-jamendo-dataset
Metadata, scripts and baselines for the MTG-Jamendo dataset
Stars: ✭ 140 (+258.97%)
Mutual labels:  music-information-retrieval
Music-Genre-Classification
Genre Classification using Convolutional Neural Networks
Stars: ✭ 27 (-30.77%)
Mutual labels:  music-information-retrieval
Multimodal-Future-Prediction
The official repository for the CVPR 2019 paper "Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction"
Stars: ✭ 38 (-2.56%)
Mutual labels:  multimodal-deep-learning
MidiTok
A convenient MIDI / symbolic music tokenizer for Deep Learning networks, with multiple strategies 🎶
Stars: ✭ 180 (+361.54%)
Mutual labels:  music-information-retrieval
cunet
Control mechanisms to the U-Net architecture for doing multiple source separation instruments
Stars: ✭ 36 (-7.69%)
Mutual labels:  music-information-retrieval
tomato
Turkish-Ottoman Makam (M)usic Analysis TOolbox
Stars: ✭ 30 (-23.08%)
Mutual labels:  music-information-retrieval
sampleCNN-pytorch
Pytorch implementation of "Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms"
Stars: ✭ 45 (+15.38%)
Mutual labels:  music-information-retrieval
tutorial
Tutorial on Tempo, Beat and Downbeat estimation
Stars: ✭ 44 (+12.82%)
Mutual labels:  mir
dlaudio
Master thesis: Structured Auto-Encoder with application to Music Genre Recognition (code)
Stars: ✭ 14 (-64.1%)
Mutual labels:  music-information-retrieval

MusCaps: Generating Captions for Music Audio

Ilaria Manco1 2, Emmanouil Benetos1, Elio Quinton2, Gyorgy Fazekas1
1 Queen Mary University of London, 2 Universal Music Group

This repository is the official implementation of "MusCaps: Generating Captions for Music Audio" (IJCNN 2021). In this work, we propose an encoder-decoder model to generate natural language descriptions of music audio. We provide code to train our model on any dataset of (audio, caption) pairs, together with code to evaluate the generated descriptions on a set of automatic metrics (BLEU, METEOR, ROUGE, CIDEr, SPICE, SPIDEr).

Setup

The code was developed in Python 3.7 on Linux CentOS 7 and training was carried out on an RTX 2080 Ti GPU. Other GPUs and platforms have not been fully tested.

Clone the repo

git clone https://github.com/ilaria-manco/muscaps
cd muscaps

You'll need to have the libsndfile library installed. All other requirements, including the code package, can be installed with

pip install -r requirements.txt
pip install -e .

Project structure

root
├─ configs                      # Config files
│   ├─ datasets
│   ├─ models  
│   └─ default.yaml              
├─ data                         # Folder to save data (input data, pretrained model weights, etc.)
│   ├─ audio_encoders   
│   ├─ datasets            
│   │   └─ dataset_name     
|   └── ...             
├─ muscaps
|   ├─ caption_evaluation_tools # Translation metrics eval on audio captioning 
│   ├─ datasets                 # Dataset classes
│   ├─ models                   # Model code
│   ├─ modules                  # Model components
│   ├─ scripts                  # Python scripts for training, evaluation etc.
│   ├─ trainers                 # Trainer classes
│   └─ utils                    # Utils
└─ save                         # Saved model checkpoints, logs, configs, predictions    
    └─ experiments
        ├── experiment_id1
        └── ...                  

Dataset

The datasets used in our experiments is private and cannot be shared, but details on how to prepare an equivalent music captioning dataset are provided in the data README.

Pre-trained audio feature extractors

For the audio feature extraction component, MusCaps uses CNN-based audio tagging models like musicnn. In our experiments, we use @minzwon's implementation and pre-trained models, which you can download from the official repo. For example, to obtain the weights for the HCNN model trained on the MagnaTagATune dataset, run the following commands

mkdir data/audio_encoders
cd data/audio_encoders/
wget https://github.com/minzwon/sota-music-tagging-models/raw/master/models/mtat/hcnn/best_model.pth
mv best_model.pth mtt_hcnn.pth

Training

Dataset, model and training configurations are set in the respective yaml files in configs. Some of the fields can be overridden by arguments in the CLI (for more details on this, refer to the training script).

To train the model with the default configs, simply run

cd muscaps/scripts/
python train.py <baseline/attention> --feature_extractor <musicnn/hcnn> --pretrained_model <msd/mtt>  --device_num <gpu_number>

This will generate an experiment_id and create a new folder in save/experiments where the output will be saved.

If you wish to resume training from a saved checkpoint, run

python train.py <baseline/attention> --experiment_id <experiment_id>  --device_num <gpu_number>

Evaluation

To evaluate a model saved under <experiment_id> on the captioning task, run

cd muscaps/scripts/
python caption.py <experiment_id> --metrics True

Cite

@misc{manco2021muscaps,
      title={MusCaps: Generating Captions for Music Audio}, 
      author={Ilaria Manco and Emmanouil Benetos and Elio Quinton and Gyorgy Fazekas},
      year={2021},
      eprint={2104.11984},
      archivePrefix={arXiv}
}

Acknowledgements

This repo reuses some code from the following repos:

Contact

If you have any questions, please get in touch: [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].