下载中心
优秀审稿专家
优秀论文
相关链接
首页 > , Vol. , Issue () : -
摘要

如何有效地提取和融合不同模态的特征是高光谱图像和激光雷达数据联合分类的关键。近年来,得益于深度学习强大的特征学习能力,其在高光谱图像和激光雷达数据联合分类领域受到了越来越多的关注。然而,现有的深度学习模型大多基于监督学习的模式,分类性能依赖标注样本的数量和质量。为此,本文提出了一种基于模态间匹配学习的联合分类方法,充分利用未标注样本的信息,减少对标注信息的依赖性。具体而言,本文首先通过高光谱图像和激光雷达数据之间的匹配关系和KMeans聚类算法,构造模态匹配标签。然后,利用该标签训练含有多个卷积层的匹配学习网络。该网络由两个并行分支构成,每个分支负责提取单个模态的特征。最后,以该网络为基础,构造高光谱图像和激光雷达数据联合分类模型。该模型的参数由匹配学习网络进行初始化,因而只需要少量标注样本进行微调即可达到理想的分类效果。为了验证本文方法的有效性,在Houston和MUUFL两个常用的高光谱图像和激光雷达数据联合分类数据集上进行了大量的实验。实验结果表明,与已有的分类模型相比,本文方法能够获得更高的分类性能。
In recent years, there have been plenty of excellent models for joint classification of hyperspectral and LiDAR data, which were designed on the basis of supervised learning methods such as convolutional neural networks. Their classification performance depends largely on the quantity and quality of training samples. Nevertheless, when the distribution of ground objects becomes increasingly complex and the resolutions of remote sensing images grow higher and higher, it is hard to obtain high-quality labels only with limited cost and manpower. To this end, many scholars made efforts to learn features directly from unlabeled samples. For instance, the theory of autoencoder was applied to multimodal joint classification, achieving satisfactory performance. Although those methods based on reconstruction idea reduce the dependence on labeled information to a certain extent, there still exist several problems to be settled. For example, such methods pay more attention to data reconstruction, and fail to guarantee that the extracted features have sufficient discriminant ability, thus affecting the performance of joint classification. To address the aforementioned issue, this paper proposes an effective model named Joint Classification of Hyperspectral and LiDAR Data Based on Inter-Modality Match Learning. Different from feature extraction models based on reconstruction idea, the proposed model tends to compare the matching relationship between samples from different modality, hence enhancing the discriminative ability of features. Specifically, it is composed of inter-modality matching learning network and multimodal joint classification network. The former one is prone to identify whether the input patch pairs of hyperspectral image and LiDAR data match, so reasonable construction of matching labels is essential. To this end, spatial positions of center pixels in cropped patches and KMeans clustering method are employed. Such constructed labels and the patch pairs are combined together to train the network. It is worth noting that this process does not use manual labeled information and is capable of directly extracting features from abundant unlabeled samples. Furthermore, in the joint classification stage, the structure and trained parameters of matching learning network are transferred, and then a small number of manually labeled training samples are used to finetune the model parameters. Eventually, extensive experiments were conducted on two widely used datasets named Houston and MUUFL to verify the effectiveness of the proposed model, including comparison experiments with several state-of-the-art models, hyper-parameter analyses experiments and ablation study experiments. Take the first experiment as an example, when compared with other models such as CNN, EMFNet, AE_H, AE_HL, CAE_H, CAE_HL, IP-CNN and PToP CNN, the proposed model in this paper is able to achieve higher performance on both datasets, whose OAs are 88.39% and 81.46%, respectively. In summary, our model enables to reduce the dependence on manually labeled data and improve the joint classification accuracy in the case of limited training samples. In the future, a better model structure and more testing datasets will be explored to make further improvements.