Newsroom
Unmanned aerial vehicles (UAVs) have become an important tool for power equipment inspection. However, due to the lack of high-quality multi-modal datasets and the difficulty of accurately identifying small targets such as power lines, semantic segmentation of aerial power equipment remains challenging.
In a study published in Pattern Recognition, a research team led by Prof. CHAO Jianshu from Fujian Institute of Research on the Structure of Matter of the Chinese Academy of Sciences developed a novel RGB-D semantic segmentation framework, M3WaveGNet, for UAV-based power equipment inspection.
To address the lack of publicly available multi-modal datasets for aerial power equipment inspection, researchers developed an AirSim Power SystemDataset (APSD) using the AirSim simulation platform. APSD contains more than 4,000 RGB-D image pairs collected from multiple urban and industrial environments, including power lines, power poles, street lights, and traffic lights.
Besides, researchers introduced M3WaveGNet, a lightweight semantic segmentation network. It employs a multi-modal and multi-level wavelet feature fusion encoder which utilizes multi-resolution wavelet decomposition to preserve fine-grained details while enhancing semantic representation. Through a Stage-Level Feature Exchange strategy, RGB and depth features interact throughout the encoding process, enabling effective local cross-modal feature fusion.
Inspired by recent advances in state space models, researchers designed a Multi-modal Fusion Module based on a multi-input single-output state space architecture. Unlike conventional fusion methods that rely on simple concatenation or attention mechanisms, the proposed module explicitly models interactions between RGB and depth modalities in both spatial and channel dimensions, facilitating efficient global feature fusion with linear computational complexity.
Extensive experiments demonstrated the effectiveness of M3WaveGNet. Compared with RGB-only segmentation methods, it improved the mean Intersection-over-Union (mIoU) by more than 25%. On APSD, it achieved an mIoU of 83.57% while maintaining real-time inference capability at over 60 FPS. Notably, it achieved an IoU of 91.29% for power-line segmentation, outperforming state-of-the-art RGB-D segmentation approaches.
To evaluate the trade-off between segmentation accuracy and inference speed, researchers proposed a new metric named IoU-Fscore which quantitatively assesses the balance between mIoU and FPS, offering a benchmark for real-time UAV perception systems. M3WaveGNet was also validated on several public datasets, including TTPLA, Mid-Air, and Cityscapes. It achieved competitive or superior performance compared with existing RGB-D segmentation approaches.
This study demonstrates the effectiveness of combining multi-modal perception, wavelet-based multi-resolution analysis, and state space modeling for UAV-based infrastructure inspection. The proposed APSD dataset, M3WaveGNet, and IoU-Fscore metric provide resources and foundations for future research on intelligent power system inspection and autonomous UAV perception.