Music Detection in Environmental Recordings

 
 

One of most interesting content in environmental recordings is the music, and thus it is important to be able to distinguish which segments of recordings contain music for the Music Information Retrieval.

Our task is to discriminate music plus a lot of noise from a lot of noise with no music, which is a lot harder than the more commonly-reported task where there isn't the large amount of noise present.

 

 

Music detection based on Dynamics of a Pitch

  • Motivated by psychoacoustic evidence that pitch is crucial in the perception and organization of sound even in the presence of highly-variable and energetic noises, we developed a noise-robust musical pitch detection algorithm to locate music-like regions. To avoid false alarms resulting from aperiodic and periodic stationary noises (such as machinery sounds), we use higher-lag coefficients of autocorrelation (AC) compensated by long-time averaged AC to estimate a dynamics of pitch.
  • Since music typically has a flat pitch contour in the ACF, music has higher value of it than other as shown figure 1 whereas speech has lower value due to a gradual change of pitch contour. Therefore, the dynamics can estimate how flat contour a pitch has.

Figure 1. Features of clean speech and music in broadcasting recordings.

 

  • Most sucessful features for speech/music discrimination are a variance of Spectral-Flux (vFlux) and 4Hz Modulation Energy(4HzE). As shown in figure 1, clean speech has a higher value of vFlux and 4HzE than clean music. However, these features are less accurate in discriminating a noisy speech and music in our real-world recordings as shown figure 2 because these features represent a hopelessly entangled mixture of the music and other background highly energetic interferences.
  • On the other hand, our proposed features, mean and variance of dyanmics of pitch, are helpful to detect music corrupted by a lot of noises from a lot of noisy non-musical sounds including speech.

 

Figure 2. Features of noisy speech and music in environmental recordings.

 

  • Matlab codes for calculating the dynamics of a pitch are available now.
 
 

   Last updated on Sep. 2007