中文 |

Research Progress

How Does the Division between Early and Late Reflections Impact on Intelligibility of Ideal Binary-masked Speech?

Jul 03, 2015

Speech is the most natural means of human-human communication. However, it is often distorted in everyday listening conditions by ambient noise, competing voice, and reverberation.

In the past several decades, many studies on speech perception have revealed that human speech understanding remains remarkably robust in adverse listening conditions, where various kinds of interferences are present.

The ability of humans to segregate the target signal from an acoustic mixture in adverse conditions is generally thought to involve the process of auditory scene analysis. Inspired by the principles of it, increased attention has been given to computational auditory scene analysis.

Motivated by the auditory masking phenomenon, the research in computational auditory scene analysis has suggested that its computational goal for segregating speech from noise is provided by the ideal binary mask (IBM). The IBM has been found to yield substantial improvements in speech intelligibility in noise.

The IBM has recently been extended to reverberant conditions where the direct sound and early reflections of target speech are regarded as the desired signal. It is of great interest to know how the division between early and late reflections impacts on the intelligibility of the IBM-processed noisy reverberant speech.

Recently, researchers from the Institute of Acoustics of the Chinese Academy of Sciences and the Institute of Linguistics of the Chinese Academy of Social Sciences have experimentally done investigation for this effect.

In their research, the division between early and late reflections in three reverberant rooms under the stationary (speech-shaped) and non-stationary (competing-talker voice) noise conditions was first determined.

Specifically, four typical approaches were considered for estimating the division between early and late reflections, including the fixed value (50 ms), the model-based estimator and two signal-based estimators.

The IBMs were then applied to the noisy reverberant mixture signal for segregating the desired signal, and the segregated signal was further presented to normal-hearing listeners for word recognition.

Experiments results showed that the IBMs with different divisions between early and late reflections provided substantial improvements in speech intelligibility over the unprocessed mixture signals in all conditions tested. Besides, there were small, but statistically significant, differences in speech intelligibility between the different IBMs in some conditions tested.

The results of the current research have important implications for the implementation of the IBM processing strategy in practical noisy reverberant conditions.

This research was partially supported by the National 973 Program (2013CB329302), the National Natural Science Foundation of China (No. 11461141004), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant Nos. XDA06030100 and XDA06030500), and the National 863 Program (No. 2012AA012503).

Contact Us
  • 86-10-68597521 (day)

    86-10-68597289 (night)

  • 86-10-68511095 (day)

    86-10-68512458 (night)

  • cas_en@cas.cn

  • 52 Sanlihe Rd., Xicheng District,

    Beijing, China (100864)

Copyright © 2002 - Chinese Academy of Sciences