All Projects → jukedeck → Nottingham Dataset

jukedeck / Nottingham Dataset

Licence: gpl-3.0
Cleaned version of the Nottingham dataset

Projects that are alternatives of or similar to Nottingham Dataset

industrial-ml-datasets
A curated list of datasets, publically available for machine learning research in the area of manufacturing
Stars: ✭ 45 (-52.13%)
Mutual labels:  ml, datasets
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+1365.96%)
Mutual labels:  datasets, ml
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+4158.51%)
Mutual labels:  datasets, ml
COVID-Net
Launched in March 2020 in response to the coronavirus disease 2019 (COVID-19) pandemic, COVID-Net is a global open source, open access initiative dedicated to accelerating advancement in machine learning to aid front-line healthcare workers and clinical institutions around the world fighting the continuing pandemic. Towards this goal, our global…
Stars: ✭ 41 (-56.38%)
Mutual labels:  ml, datasets
rs datasets
Tool for autodownloading recommendation systems datasets
Stars: ✭ 22 (-76.6%)
Mutual labels:  ml, datasets
mindsdb-examples
Examples for usage of Mindsdb https://www.mindsdb.com/
Stars: ✭ 25 (-73.4%)
Mutual labels:  ml, datasets
Cleora
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
Stars: ✭ 303 (+222.34%)
Mutual labels:  datasets, ml
Russian news corpus
Russian mass media stemmed texts corpus / Корпус лемматизированных (морфологически нормализованных) текстов российских СМИ
Stars: ✭ 76 (-19.15%)
Mutual labels:  ml
Dareblopy
Data Reading Blocks for Python
Stars: ✭ 82 (-12.77%)
Mutual labels:  datasets
Caffe2
Caffe2 is a lightweight, modular, and scalable deep learning framework.
Stars: ✭ 8,409 (+8845.74%)
Mutual labels:  ml
Networkml
Machine learning plugins for network traffic
Stars: ✭ 73 (-22.34%)
Mutual labels:  ml
Home
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+1175.53%)
Mutual labels:  ml
Makine Ogrenmesi
Makine Öğrenmesi Türkçe Kaynak
Stars: ✭ 82 (-12.77%)
Mutual labels:  ml
Mlflow
Open source platform for the machine learning lifecycle
Stars: ✭ 10,898 (+11493.62%)
Mutual labels:  ml
Kaos
open source platform for simplifying machine learning deployment
Stars: ✭ 87 (-7.45%)
Mutual labels:  ml
Qt 5 And Opencv 4 Computer Vision Projects
Qt 5 and OpenCV 4 Computer Vision Projects, published by Packt
Stars: ✭ 72 (-23.4%)
Mutual labels:  ml
Fklearn
fklearn: Functional Machine Learning
Stars: ✭ 1,305 (+1288.3%)
Mutual labels:  ml
Cv19index
COVID-19 Vulnerability Index
Stars: ✭ 87 (-7.45%)
Mutual labels:  ml
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+1207.45%)
Mutual labels:  datasets
Atis dataset
The ATIS (Airline Travel Information System) Dataset
Stars: ✭ 81 (-13.83%)
Mutual labels:  datasets

The original ABC files were sourced from the Nottingham Music Database, and are tracked here untouched.

Technical cleaning specifications

Here we list the manual and programmatic modifications we made to the ABC files to be able to convert them to MIDI.

ABC files

Running a diff command between a file in the original folder and its corresponding cleaned version will highlight all the changes we made to the plain text ABC.

Chord notation

We made the chord notation more consistent and easily parsable by:

  1. Removing uninterpretable symbols (e.g. "/@<.5A7")
  2. Using one single format for diminished/augmented/over-note notation ("Cd" for diminished, "Ca" for augmented and "C/e" for a C chord over E)

Repeats

We strived to make the repeat notation more machine-readable by:

  1. Adding beginning-of-repeat symbols (|:) whenever their position might be programmatically ambiguous
  2. Uniforming the first and second time bar notation (some pieces were using only the numbers 1 and 2, others the symbols [1 and [2; we opted for the second)
  3. Adding double bars || at the end of all second time bars, to make it more transparent which notes are part of a repetition structure and which are not.

Part notation

We expanded the usage of the part name notation, and used it to encode for other score notations that are not easily machine-interpretable:

  1. We modified slightly the usage of the P metadata, allowing us to distinguish easily between a single part name (P) and the piece part playing sequence (new piece metadata label Y), which in the originals are both under the same tag P
  2. Substituted all notations of "Da Capo al Segno" or "Dal Segno" with a corresponding part subdivision and playing repetition.

Simplification choices

The chordal parts of a few pieces contain what can be considered a walking bass sequence of notes intermixed with the chords themselves. We decided to remove these extra notes from the cleaned ABC files for two reasons:

  1. The chords are significantly easier to interpret without them;
  2. There are only very few pieces that have this feature, making it of no interest for the purpose of learning algorithms.

We also removed the lyrics from the few pieces that contained them for the same reason.

Generic cleaning

Some pieces contained bars of the wrong duration because of an inconsistent number of notes or note lengths. In most cases we tried to fix these inconsistencies by filling the bar or reducing the duration of the notes in the way that we thought made the most musical sense, but in a couple of cases we opted for simply removing the offending pieces (e.g. the Pachelbel’s Canon renditions).

MIDI conversion

Some ABC pieces have two sequences of alternative chords in a repeat, so that when playing the melody the second time the chord sequence is different. This is valuable musical information, so we did not remove it from the ABC files, but we decided to ignore it in the MIDI conversion to simplify the parsing operation, so that all repeats always use the first chord of each two-chord alternate pair.


This repository is released under the GNU GPLv3 license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].