基于图像分解去纠缠与边缘引导的遥感影像语义分割算法研究

连远锋; 李科科

下载中心

优秀审稿专家

优秀论文

首页 > , Vol. , Issue () : -

摘要

全文摘要次数： 180 全文下载次数： 90

引用本文:

DOI:

10.11834/jrs.20254525

收稿日期:

2024-11-18

修改日期:

2025-03-24

PDF Free EndNote BibTeX

基于图像分解去纠缠与边缘引导的遥感影像语义分割算法研究

连远锋, 李科科

中国石油大学（北京）人工智能学院

摘要:

遥感图像具有目标尺寸差异显著、背景复杂多变的特点，存在地物光谱混淆和特征边界不清晰等现象，这增加了语义分割任务的难度。针对不同光照条件下遥感图像目标由于特征相互依赖而导致的语义分割困难问题，本文提出了一种基于去纠缠的遥感图像语义分割模型，该模型由光照反射去纠缠网络（LRD-Net）和多模态语义分割网络（MSS-Net）构成。首先，基于Retinex理论设计LRD-Net网络用于分解光学图像中的光照和反射特征，通过权重共享Transformer（WS-Transformer）提取目标的全局和局部特征；其次，引入多尺度噪声模块对光照分量进行自适应增强以提高模型的去纠缠能力，通过显著特征强化模块（SE）突出不同分量特征之间的差异信息；最后，使用边缘特征提取模块（EE）提高遥感目标的边缘识别能力，并通过多模态语义分割网络（MSS-Net）融合光照特征和反射特征提升语义分割性能。在通用数据集ISPRS Vaihingen和ISPRS Potsdam上评价指标mIoU分别达到84.60%和87.42%。实验结果表明，本文提出的模型在遥感图像语义分割任务中优于其他模型。

关键词:

语义分割图像分解 Retinex理论 Transformer

Research on Semantic Segmentation Algorithm for Remote Sensing Images Based on Image Decomposition to Remove Entanglement and Edge Guidance

Abstract:

Remote sensing semantic segmentation refers to the task of classifying pixels in remote sensing images into predefined object categories, such as vehicles, buildings, and vegetation. As a prominent research topic in remote sensing, semantic segmentation provides critical support for land-use classification, urban planning, and disaster monitoring. Due to the significant differences in target sizes and the complex and variable backgrounds in remote sensing images, there are phenomena such as target spectral confusion and blurred feature boundaries, which increase the difficulty of the semantic segmentation task. To address the semantic segmentation difficulties caused by feature interdependencies under varying illumination conditions in urban environments, this paper proposes a disentanglement-based semantic segmentation model comprising a Light-Reflectance Disentanglement Network (LRD-Net) and a Multi-modal Semantic Segmentation Network (MSS-Net). The proposed method consists of several steps. First, based on the Retinex theory, we design the LRD-Net network to decompose the illumination and reflectance features in optical images. The model utilizes a Weight Sharing Transformer (WS-Transformer) to extract global contextual information and spatial local contextual information. Secondly, we design a simple and effective multi-scale noise module to adaptively enhance the illumination component, improving the model"s robustness in disentangling. Next, we construct a Significant feature Enhancement module (SE) composed of channel attention and convolutional layers to increase the differentiation of features extracted by LRD-Net, resulting in a more accurate representation of surface illumination features. Finally, the disentangled features are fused using MSS-Net to generate semantic segmentation results, effectively leveraging multi-modal features that contain illumination and reflection information. Additionally, we design an Edge feature Extraction module (EE) to enhance the representation capability of edge features, thereby improving the accuracy of object prediction and the integrity of overall contours. During the training process, a loss function based on the Retinex theory is constructed to better perceive changes in illumination and the reflective characteristics of surfaces. Experiments are conducted on the general datasets ISPRS Vaihingen and ISPRS Potsdam. The results of the feature disentanglement effect indicate that the model effectively captures the variations in shadows and the distribution of illumination intensity in the extracted illumination feature heatmaps, achieving an efficient disentanglement of illumination components and invariant reflectance. Compared to the primary model, the experimental results show that the evaluation metric mIoU reaches 84.60% and 87.42%, respectively. Additionally, the model demonstrates better performance in terms of average F1 score and overallaccuracy The experimental findings suggest that the proposed model outperforms other models in the task of semantic segmentation of remote sensing images. Moreover, the ablation experiment results indicate that the proposed WS module, SE module, and EE module can effectively enhance the performance of semantic segmentation. The proposed disentanglement-based model significantly improves edge processing capability and overall segmentation accuracy through enhanced edge feature extraction and illumination-reflectance disentanglement. Results indicate the model is well-suited for segmenting buildings, low vegetation, and trees, given its superior performance on target scale imbalance and spectral variations challenges. It effectively addresses decreased recognition accuracy of remote sensing images under varying lighting conditions. Future work will explore integrating large remote sensing models with disentanglement theory to further advance segmentation precision.

Key Words:

Semantic segmentation, Image decomposition, Retinex theory, Transformer

本文暂时没有被引用！