All Projects → rupakvignesh → Lyrics-to-Audio-Alignment

rupakvignesh / Lyrics-to-Audio-Alignment

Licence: other
Aligns text (lyrics) with monophonic singing voice (audio). The algorithm uses structural segmentation to segment the audio into structures and then uses hidden markov models to obtain alignment within segments. The final alignment is concatenation of time stamps of lyrics within the segments for each song.

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects

Projects that are alternatives of or similar to Lyrics-to-Audio-Alignment

Alignmentduration
Lyrics-to-audio-alignement system. Based on Machine Learning Algorithms: Hidden Markov Models with Viterbi forced alignment. The alignment is explicitly aware of durations of musical notes. The phonetic model are classified with MLP Deep Neural Network.
Stars: ✭ 36 (-36.84%)
Mutual labels:  lyrics, alignment
mrivis
medical image visualization library and development toolkit
Stars: ✭ 19 (-66.67%)
Mutual labels:  alignment
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+1029.82%)
Mutual labels:  segmentation
BuddySuite
Bioinformatics toolkits for manipulating sequence, alignment, and phylogenetic tree files
Stars: ✭ 106 (+85.96%)
Mutual labels:  alignment
DeepPhonemizer
Grapheme to phoneme conversion with deep learning.
Stars: ✭ 152 (+166.67%)
Mutual labels:  phonemes
colorify
Colorify - C# .Net Console Library with Text Format: colors, alignment and lot more [ Win+Mac+Linux ]
Stars: ✭ 49 (-14.04%)
Mutual labels:  alignment
Point2Sequence
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
Stars: ✭ 34 (-40.35%)
Mutual labels:  segmentation
cath-tools
Protein structure comparison tools such as SSAP and SNAP
Stars: ✭ 40 (-29.82%)
Mutual labels:  alignment
pcan
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight
Stars: ✭ 294 (+415.79%)
Mutual labels:  segmentation
MiVOS
[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Semi-supervised VOS as well!
Stars: ✭ 302 (+429.82%)
Mutual labels:  segmentation
hair-dye
Neural Network for Dying Hair💈
Stars: ✭ 45 (-21.05%)
Mutual labels:  segmentation
lyricsmaster
LyricsMaster is a library for downloading lyrics from multiple lyrics providers.
Stars: ✭ 18 (-68.42%)
Mutual labels:  lyrics
color-pop
🌈 Automatic Color Pop effect on any image inspired by Google Photos
Stars: ✭ 21 (-63.16%)
Mutual labels:  segmentation
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (-8.77%)
Mutual labels:  alignment
dcsp segmentation
No description or website provided.
Stars: ✭ 34 (-40.35%)
Mutual labels:  segmentation
superpixelRefinement
Superpixel-based Refinement for Object Proposal Generation (ICPR 2020)
Stars: ✭ 24 (-57.89%)
Mutual labels:  segmentation
reveal
Graph based multi genome aligner
Stars: ✭ 39 (-31.58%)
Mutual labels:  alignment
payment alipay
odoo alipay module
Stars: ✭ 27 (-52.63%)
Mutual labels:  alignment
CarND-Detect-Lane-Lines-And-Vehicles
Use segmentation networks to recognize lane lines and vehicles. Infer position and curvature of lane lines relative to self.
Stars: ✭ 66 (+15.79%)
Mutual labels:  segmentation
uoais
Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling", ICRA 2022
Stars: ✭ 77 (+35.09%)
Mutual labels:  segmentation

Lyrics-to-Audio-Alignment

This project aims at creating an automatic alignment between the textual lyrics and monophonic singing vocals (audio). This system shall be very useful in a setting where a karoake performer would want to keep in sync with the background track. Traditional Hidden Markov Models are used for phoneme modelling and an interesting structural segmentation approach has been explored to break the audio (usually of length 4-5 minutes) to smaller chunks that are structurallly meaningful (Intro, Verse, Chorus, etc) without any implicit assumptions.

Watch the Demo

Video link

Pre-requisites

Training Steps

Training Acoustic models

TIMIT

  • Create initial hmm models (isolated phoneme training)
tcsh scripts/model_gen.sh <phonelist> <proto_file>
  • Create connected HMM models (embedded re-estimation)
tcsh script/embedded_reestimation.sh <iterations>

Damp

  • Align Damp dataset with the generated HMM Models using forced Viterbi alignment
  • Perform embedded reestimation using the Damp Dataset to refine the phoneme models.

Structural Segmentation

  • Use MSAF library to segment Damp training data into structural segments
python scripts/msaf_segmentation.py <wav_in_dir> <wav_out_dir>
  • Create MLF files corresponding to the segmented audio
python scripts/msaf_to_mlf.py <labfile_list>
  • Perform embedded reestimation within these segments to get the final phoneme models

Testing

  • To test any model do the forced Viterbi alignment initially
sh scripts/force_align.sh

Set the parameters such as model, features, mlf, dictionary, etc inside the file.

  • To evaluate the performance of the model, use the manually annotated groundtruth and compute overlap.
python scripts/lab_to_lrc.py <lyrics_list>

Set the groundtruth and output folder inside the script.

Authors

Acknowledgments

  • Thanks to Alex Lerch for his guidance
  • S Aswin Shanmugham's hybrid segmentation framework
  • Stanford's DAMP dataset.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].