|
Music detection based on Dynamics of a Pitch
- Motivated by psychoacoustic evidence that pitch is crucial in the perception and organization of sound even in the presence of highly-variable and energetic noises, we developed a noise-robust musical pitch detection algorithm to locate music-like regions. To avoid false alarms resulting from aperiodic and periodic stationary noises (such as machinery sounds), we use higher-lag coefficients of autocorrelation (AC) compensated by long-time averaged AC to estimate a dynamics of pitch.
- Since music typically has a flat pitch contour in the ACF, music has higher value of it than other as shown figure 1 whereas speech has lower value due to a gradual change of pitch contour. Therefore, the dynamics can estimate how flat contour a pitch has.

Figure 1. Features of clean speech and music in broadcasting recordings.
- Most sucessful features for speech/music discrimination are a variance of Spectral-Flux (vFlux) and 4Hz Modulation Energy(4HzE). As shown in figure 1, clean speech has a higher value of vFlux and 4HzE than clean music. However, these features are less accurate in discriminating a noisy speech and music in our real-world recordings as shown figure 2 because these features represent a hopelessly entangled mixture of the music and other background highly energetic interferences.
- On the other hand, our proposed features, mean and variance of dyanmics of pitch, are helpful to detect music corrupted by a lot of noises from a lot of noisy non-musical sounds including speech.

Figure 2. Features of noisy speech and music in environmental recordings.
- Matlab codes for calculating the dynamics of a pitch are available now.
|