首页 > , Vol. , Issue () : -
遥感图像具有丰富的纹理信息和复杂的整体结构，因此在场景分类任务中进行多尺度的特征提取至关重要。基于此，设计了局部特征提取模块ADC模块(Aggregation Depthwise Convolution Block，ADC)和全局-局部特征提取模块CPA模块(Convolution Parallel Attention Block，CPA)，并在ADC模块中提出一种非对称深度卷积组，以增强模型对图像翻转和旋转的鲁棒性；在CPA模块中提出一种能够扩大感受野并增强特征提取能力的多分组卷积头分解注意力。以ADC模块和CPA模块为基础构建全新的遥感图像场景分类模型ADC-CPANet，在各阶段采用堆叠ADC模块和CPA模块的策略，从而使模型具有更好的全局特征和局部特征提取能力。为验证ADC-CPANet的有效性，本文使用开源数据集RSSCN7数据集和SIRI-WHU数据集测试ADC-CPANet与其他深度学习网络的复杂度和识别能力。实验结果表明，ADC-CPANet的分类准确率分别高达96.43%和96.04%，优于其他先进的模型。
The entire amount and types of data of high-resolution remote sensing images are booming with the rapid development of remote sensing observation technologies, such as satellites and unmanned aerial vehicles. Remote sensing information processing is entering the "era of remote sensing big data". High resolution remote sensing images enjoy more abundant texture, detailed information and complex overall structure. High-resolution remote sensing images are of great significance for urban planning and other application scenarios. At the same time, Images of the same category have great differences, and some images of different categories become similar. Therefore, multi-scale feature extraction of remote sensing images plays a significant role in the task of remote sensing image scene classification. According to the different feature representation methods, the existing remote sensing image scene classification methods can be divided into two categories: methods based on manual design features and methods based on deep learning. Scene classification algorithms for remote sensing images based on manual features cover scale invariant feature transformation, gradient scale histogram, and so on. Although these methods can achieve good classification results in some simple scene classification tasks, the feature information extracted by these methods may be incomplete or redundant, so the accuracy of classification in complex scenes is still low. The methods based on deep learning have gained incredible progress in scene classification due to its powerful feature extraction ability. In comparison to traditional methods, convolution neural networks appear in visual tasks that share more complex connections and more diverse convolution forms, which can extract local features more effectively. Nevertheless, CNNs perform poorly in the process of extracting long-distance dependencies among features. Transformer architecture has been successfully applied to the field of computer vision in recent years. Unlike traditional CNNs, Transformer"s self-attention layer enables global feature extraction of images. Some recent studies have shown that using CNN and Transformer as hybrid architectures is conducive to integrating the advantages of these two architectures. This paper proposes an aggregation depthwise convolution module and a convolution parallel attention module. The aggregation depthwise convolution module can effectively extract local feature information and enhance the robustness of the model to image flipping and rotation. The convolution parallel attention module can effectively extract global features, local features and fuse the two features. A multi-group convolution head decomposition module was designed in the convolution parallel attention module, which can expand the perception field and enhance the capacity of featuring information extraction. We designed a remote sensing image scene classification model ADC-CPANet on the basis of two modules. The strategy of stacking the ADC module and CPA module was applied to each stage of the model, which enabled the model to possess greater global and local feature extraction capabilities. RSSCN7 and Google Image datasets were selected to verify the effectiveness of the ADC-CPANet method. The experimental results demonstrated that ADC-CPANet had achieved a classification accuracy of 96.43% in the RSSCN7 dataset and 96.04% in the Google Image dataset respectively, which was superior to other advanced models. It can be seen that ADC-CPANet can extract global and local features and obtain competitive scene classification accuracy.