中文 |

Research Progress

Recognize Acoustic Scene More Accurately with Scalogram and Deep Convolutional Neural Network

Nov 28, 2018

Environmental sound contains a large amount of surrounding information. Compared with speech and music, it has richer contents. Acoustic scene modeling aims to recognize the place where the sound was recorded, which enables devices and robots to be context-aware. 

Traditional acoustic features are based on the short-time Fourier transform, such as Mel-frequency cepstral coefficients. However, environmental information is usually stored at different time scales. Accordingly, sensing signal in a multi-scale way is crucial to the task of acoustic scene modeling.  

Recently, researchers from the Institute of Acoustics (IOA) of the Chinese Academy of Sciences proposed a novel framework based on the wavelet transform and deep convolutional neural network. The study was published in Proceedings of the Annual Conference of the International Speech Communication Association (September 2018). 

The proposed framework mainly includes two modules, a front-end module based on the wavelet transform and a back-end module based on deep convolutional neural network.

The scalogram is the visual representation of coefficients extracted by wavelet filters, which can capture both transient and rhyme information. The back-end network applies small kernels and pooling operations to extract high-level semantic.  

Experiments on the acoustic scene dataset demonstrated that multi-scale feature led to an obvious accuracy increase via using the proposed framework, when compared with the short-term features. In addition, the scalogram had a lower time resolution, which saved storage space and reduced computational cost to some extent.  

 

Figure 1. The audio scene framework based on the wavelet transform and deep convolutional neural network. (Image by CHEN Hangting) 

Contact Us
  • 86-10-68597521 (day)

    86-10-68597289 (night)

  • 86-10-68511095 (day)

    86-10-68512458 (night)

  • cas_en@cas.cn

  • 52 Sanlihe Rd., Xicheng District,

    Beijing, China (100864)

Copyright © 2002 - Chinese Academy of Sciences