Understanding sound direction estimation in monaural hearing

This post was originally published on this site

The ability to locate sounds in the surrounding environment is a remarkable feature of the human ear. Typically, people with good hearing use both ears to detect and interpret auditory cues. Differences in the loudness or arrival time of sounds at each ear provide us with vital information about the location and direction of the sound source. Interestingly, however, studies have suggested that while binaural cues are sufficient for sound localization, they are not necessary. People with monaural hearing (hearing loss in one ear) can perceive sound location as well.

Fortunately for engineers, this can help remove limitations on the design and positioning of audio recording devices and microphone arrays. Used for source localization and noise reduction, microphone arrays need to be placed at specific intervals and positions to effectively capture and analyze sound from different directions. To avoid poor sound quality resulting from inadequate microphone array design or positioning, the capability to estimate the sound direction using monaural cues is highly desirable as it can help simplify microphone array designs.

In a study made available online on 13 January 2023 and published in the journal Applied Acoustics on 28 February 2023, Prof. Masashi Unoki and his colleagues from the Japan Advanced Institute of Science and Technology (JAIST) and Toyama Prefectural University, Japan, have proposed a method that uses monaural cues to estimate the direction-of-arrival (DOA) of sound signals in three dimensions. “In our work, we propose an estimation method based on monaural modulation spectrum (MMS), which relies on modulation in the frequency spectrum of the received signal to detect the signal DOA. This can help us develop monaural cues for single-channel signal processing,” explains Prof. Unoki.

To determine the monaural DOA, the team simulated sound signals from different directions using artificial amplitude modulation noise and human speech signals while accounting for the effect of the ears, torso, and head in filtering sound. Next, they obtained the MMS of the signals describing their frequency modulations to identify key features that could be tied to the DOA of the signals. To avoid monaural front-back confusion, which occurs when sound sources at the same angle in front of or behind the listener can produce the same estimates for the DOA, the researchers considered the effect of head movement on the MMS to realize a more accurate DOA estimation.

Using the known DOA and the features of the MMS as training data, they then constructed a polynomial regression model that estimated the DOA from the MMS features of the sound signal in terms of the horizontal and the vertical direction of the listener. The model could accurately estimate the DOA of 829,440 speech signals, outperforming even human monaural hearing.

While the team qualifies their findings by suggesting that there is more work to be done to account for background noise and individual differences in ear shape when creating the model, the study demonstrates an impressive advancement in monaural sound localization. Speculating about its implications, the researchers envision their technology’s applications in sound surveillance techniques and hearing aid enhancements. “Our study will help reveal our ability to localize sounds based on monaural hearing, which, in turn, could stimulate various innovations in hearing aid techniques in the long-term,” concludes Prof. Unoki.