On this page

A Probabilistic Framework for Robust Chorus Melody Recognition Using High-Order Cepstral Features and KeyIndependent Quaternary Language Models

By: Huilin Huang 1
1School of music, University of Sanya, Sanya, Hainan, 572000, China

Abstract

Chorus melody recognition—the automatic identification of note sequences from choral audio—is a critical front-end component of melody-based retrieval and educational tools. Traditional non-statistical approaches rely heavily on noisy fundamental-frequency estimation and ad-hoc segmentation, resulting in poor robustness across speakers and acoustic conditions. In this work, we present a novel probabilistic framework adapted from continuous speech recognition. First, instead of fundamental frequency, we extract high-order cepstral coefficients within the human voice pitch range (C2–E4 for male, C3–E5 for female) and normalize them to fixed-length feature vectors, thereby reducing errors due to voicing determination. Second, each note (and silence) is treated as an HMM “word” whose state likelihoods are modeled by GMMs and trained jointly via the forward–backward algorithm. Third, we construct a key-independent quaternary n-gram language model to capture prior probabilities of note transitions, obviating explicit key detection. Finally, recognition is performed by a global Viterbi search over the combined acoustic and language model. Evaluated on a corpus of multi-speaker choral recordings with syllables both “da/ta” and lyric content, our system achieves over 90% correct note-sequence accuracy in clean conditions and maintains 80% accuracy in 10 dB SNR noise, outperforming baseline fundamental-frequency-based methods by 15–20%. Moreover, integration into a chorus query prototype demonstrates a 30% improvement in top-3 retrieval precision.