All Projects → ronggong → Musical Onset Efficient

ronggong / Musical Onset Efficient

Licence: agpl-3.0
Supplementary information and code for the paper: An efficient deep learning model for musical onset detection

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Musical Onset Efficient

Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+1796.15%)
Mutual labels:  dataset, pretrained-models
Dialogrpt
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (+730.77%)
Mutual labels:  dataset, pretrained-models
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+9226.92%)
Mutual labels:  dataset, pretrained-models
Gensim Data
Data repository for pretrained NLP models and NLP corpora.
Stars: ✭ 622 (+2292.31%)
Mutual labels:  dataset, pretrained-models
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+3057.69%)
Mutual labels:  dataset
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+25500%)
Mutual labels:  dataset
Pytorchinsight
a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results
Stars: ✭ 713 (+2642.31%)
Mutual labels:  pretrained-models
Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+2550%)
Mutual labels:  dataset
Cophy
"CoPhy: Counterfactual Learning of Physical Dynamics", F. Baradel, N. Neverova, J. Mille, G. Mori, C. Wolf, ICLR'2020
Stars: ✭ 24 (-7.69%)
Mutual labels:  dataset
Rdhs
API Client and Data Munging for the Demographic and Health Survey Data
Stars: ✭ 22 (-15.38%)
Mutual labels:  dataset
Covid Ct
COVID-CT-Dataset: A CT Scan Dataset about COVID-19
Stars: ✭ 820 (+3053.85%)
Mutual labels:  dataset
Awesome Face
😎 face releated algorithm, dataset and paper
Stars: ✭ 739 (+2742.31%)
Mutual labels:  dataset
Bert Ner
Pytorch-Named-Entity-Recognition-with-BERT
Stars: ✭ 829 (+3088.46%)
Mutual labels:  pretrained-models
Clusterdata
cluster data collected from production clusters in Alibaba for cluster management research
Stars: ✭ 718 (+2661.54%)
Mutual labels:  dataset
Prosr
Repository containing an independent implementation of the paper: "A Fully Progressive Approach to Single-Image Super-Resolution"
Stars: ✭ 923 (+3450%)
Mutual labels:  pretrained-models
Caffenet Benchmark
Evaluation of the CNN design choices performance on ImageNet-2012.
Stars: ✭ 700 (+2592.31%)
Mutual labels:  dataset
Datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+3030.77%)
Mutual labels:  dataset
Bgg Analysis
What makes a game a good game?
Stars: ✭ 18 (-30.77%)
Mutual labels:  dataset
Osint collection
Maintained collection of OSINT related resources. (All Free & Actionable)
Stars: ✭ 809 (+3011.54%)
Mutual labels:  dataset
Safety Helmet Wearing Dataset
Safety helmet wearing detect dataset, with pretrained model
Stars: ✭ 802 (+2984.62%)
Mutual labels:  dataset

An efficient deep learning model for musical onset detection

This repo contains code and supplementary information for the paper:

Towards an efficient and deep learning model for musical onset detection

The code aims to:

  1. reproduce the experimental results in this work.
  2. help retrain the onset detection models mentioned in this work for your own dataset.

Below is a plot of the onset detection functions experimented in the paper.

  • Red lines in the Mel bands plot are the ground truth syllable onset positions,
  • those in the other plots are the detected onset positions by using peak-picking onset selection method.

For an interactive code demo to generate this plot and explore our work, please check our jupyter notebook. You should be able to "open with" google colaboratory in you google drive, then "open in playground" to execute it block by block. The code of the demo is in the colab_demo branch.

github_main

Contents

A. Code usage

B. Supplementary information

License

A. Code usage

A.1 Install dependencies

We suggest to install the dependencies in virtualenv

pip install -r requirements.txt

A.2 Reproduce the experiment results with pretrained models

  1. Download dataset: jingju; Böck dataset is available on request (please send an email).
  2. Change nacta_dataset_root_path, nacta2017_dataset_root_path in ./src/file_path_jingju_shared.py to your local jingju dataset path.
  3. Change bock_dataset_root_path in ./src/file_path_bock.py to your local Böck dataset path.
  4. Download pretrained models and put them into ./pretrained_models folder.
  5. Execute below command lines to reproduce jingju or Böck datasets results:
python reproduce_experiment_results.py -d <string> -a <string> 
  • -d dataset. It can be chosen jingju or bock
  • -a architecture. It can be chosen from baseline, relu_dense, no_dense, temporal, bidi_lstms_100, bidi_lstms_200, bidi_lstms_400, 9_layers_cnn, 5_layers_cnn, pretrained, retrained, feature_extractor_a, feature_extractor_b. Please read the paper to decide which experiment result you want to reproduce:

A.3 General code for training data extraction

In case that you want to extract the feature, label and sample weights for your own dataset:

  1. We assume that your training set audio and annotation are stored in folders path_audio and path_annotation.
  2. Your annotation should conform to either jingju or Böck annotation format. Jingju annotation is stored in Praat textgrid file. In our jingju textgrid annotations, two tiers are parsed: line and dianSilence; The former contains musical line (phrase) level onsets, and the latter contains syllable level onsets. We assume that you also annotated your audio file in this kind of hierarchical format: tier_parent and tier_child corresponding to line and dianSilence. Böck dataset is annotated at each onset time, you can check Böck dataset's annotation in this link,
  3. Run below command line to extract training data for your dataset:
python ./trainingSetFeatureExtraction/training_data_collection_general.py --audio <path_audio> --annotation <path_annotation> --output <path_output> --annotation_type <string, jingju or bock> --phrase <bool> --tier_parent <string e.g. line> --tier_child <string e.g. dianSilence>
  • --audio the audio files path. Audio needs to be in 44.1kHz .wav format.
  • --annotation the annotation path.
  • --output_path where we want to store the output training data.
  • --annotation_type jingju or Böck. The type of annotation we provide to the algorithm.
  • --phrase decides that if you want to extract the feature at file-level. If false is selected, you will get a single feature file for the entire input folder.
  • --tier_parent the parent tier, e.g. ling, only needed for jingju annotation type.
  • --tier_child the child tier, e.g. dianSilence, only needed for jingju annotation type.

A.4 Specific code for jingju and Böck datasets training data extraction

In case the you want to extract the feature, label and sample weights for the jingu and Böck datasets, we provide the easy executable code for this purpose. This script is memory-inefficient. It heavily slowed down my computer after finishing the extraction. I haven't found the solution to solve this problem. If you do, please kindly send me an email to tell me how. Thank you.

  1. Download dataset: jingju; Böck dataset is available on request (please send an email).
  2. Change nacta_dataset_root_path, nacta2017_dataset_root_path in ./src/file_path_jingju_shared.py to your local jingju dataset path.
  3. Change bock_dataset_root_path in ./src/file_path_bock.py to your local Böck dataset path.
  4. Change feature_data_path in ./src/file_path_shared.py to your local output path.
  5. Execute below command lines to extract training data for jingju or Böck datasets:
python ./training_set_feature_extraction/training_data_collection_jingju.py --phrase <bool>
python ./training_set_feature_extraction/training_data_collection_bock.py
  • --phrase decides that if you want to extract the feature at file-level. If false is selected, you will get a single feature file for the entire input folder. Böck dataset can only be processed in phrase-level.

A.5 Train the models using the training data

Below scripts allow you to train the model from the training data which you should have already extracted in step A.4.

  1. Extract jingju or Böck training data by following step A.4.
  2. Execute below command lines to train the models.
python ./training_scripts/jingju_train.py -a <string, architecture> --path_input <string> --path_output <string> --path_pretrained <string, optional>
python ./training_scripts/bock_train.py -a <string, architecture> --path_input <string> --path_output <string> --path_cv <string> --path_annotation <string> --path_pretrained <string, optional>
  • -a variable can be chosen from baseline, relu_dense, no_dense, temporal, bidi_lstms_100, bidi_lstms_200, bidi_lstms_400, 9_layers_cnn, 5_layers_cnn, retrained, feature_extractor_a, feature_extractor_b. pretrained model is not necessary to train explicitly because it comes from the 5_layers_cnn model of the other datasets.
  • --path_input the training data path.
  • --path_output the model output path.
  • --path_pretrained the pretrained model path for the transfer learning experiments.
  • --path_cv the 8 folds cross-validation files path, only used for Böck dataset.
  • --path_annotation the annotation path, only used for Böck dataset.

B. Supplementary information

B.1 Pretrained models

Pretrained models link

These models have been pretrained on jingju and Böck datasets. You can put them into ./pretrained_models folder to reproduce the experiment results.

B.2 Full results (precision, recall, F1)

Full results link

In jingju folder, you will find two result files for each model. The files with the postfix name _peakPickingMadmom are the results of peak-picking onset selection method, and those with _viterbi_nolabel are score-informed HMM results. In each file, only the first 5 rows are related to the paper, others are computed by using other evaluation metrics.

_peakPickingMadmom first 5 rows format:

onset selection method
best threshold searched on the holdout test set
Precision
Recall
F1-measure

_viterbi_nolabel first 5 rows format:

onset selection method
whether we evaluate label of each onset (no in our case)
Precision
Recall
F1-measure

In Böck folder, there is only one file for each model, and its format is:

best threshold searched on the holdout test set
Recall Precision F1-measure

B.3 Statistical significance calculation data

link

The files in this link contain

  • the jingju dataset evaluation results of 5 training times.
  • the Böck dataset evaluation results of 8 folds.

You can download and put these files into ./statistical_significance/data folder. We also provide code for the data parsing and p-value calculation. Please check ttest_experiment.py and ttest_experiment_transfer.py for the detail.

B.4 Loss curves (section 5.1 in the paper)

These loss curves aim to show the overfitting of Bidi LSTMs 100 and 200 models for Böck dataset and 9-layers CNN for both datasets.

Böck dataset Bidi LSTMs 100 losses (3rd fold)

bidi_lstms_100_Bock

Böck dataset Bidi LSTMs 200 losses (4th fold) bidi_lstms_200_Bock

Böck dataset Bidi LSTMs 400 losses (1st fold) bidi_lstms_200_Bock

Böck dataset baseline and 9-layers CNN losses (2nd model) 9-layers_CNN_and_baseline_bock

Jingju dataset baseline and 9-layers CNN losses (2nd model) 9-layers_CNN_and_baseline_jingju

License

Code

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Dataset, pretrained models and any other data used in this work (expect Böck dataset)

Creative Commons Attribution-NonCommercial 4.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].