首页 > , Vol. , Issue () : 1993-2002
Objective Scene classification and recognition of remote sensing image is an important task for image interpretation. With the development of the remote sensing technique, the image with the higher resolution has richer spatial texture features and semantic information and the category of remote sensing image is more diverse than ever before. As a result, the similarity between images of different categories as well as the difference between images of the same category has been more distinguished, which makes images more difficult to be classified and recognized correctly. Therefore, choosing more effective features and classification algorithms is the key to improve the performance of classification. The algorithms of traditional scene classification adopt low-level or mid-level handcraft features. These features have poor ability to extract high-level semantic information of images and it is difficult to achieve satisfactory results on massive complex scene images. Deep learning, especially convolutional neural network, has made great progress in computer vision and compared with the methods using handcraft features, it is the most effective way for image classification nowadays. Some scholars began to apply convolutional neural network to remote sensing image classification and obtained higher precision than methods using traditional features. But training a deep convolutional neural network which has too many parameters needs a large number of labeled images and the process of training is so complicated and time-consuming. Generally, a deep convolutional neural network would not perform well with few images. Method A method for images classification using ensemble convolutional neural network is proposed in this study to improve the performance of convolutional neural network. The method presented in this paper contains three main phases called preprocessing, feature extraction and ensemble learning. In the preprocessing stage, the preprocessing work includes geometry normalization, image intensity normalization and image augment. In the feature extraction phase, several deep convolutional neural networks which have been well pre-trained on ImageNet are chosen to remove the last classification layer in the network and to extract different deep features of the same image. A stacking model is constructed in the ensemble learning phase. The stacking model consists of base classifier and meta classifier. Base classifier is composed of several Logistic regression modes which are used to train different features extracted by deep convolutional neural networks. The meta classifier is a support vector machine. Finally, the probability distribution predicted by base classifier is used to construct a new dataset which would be trained by the meta classifier. Result Experiments are conducted on two datasets named UCMerced_LandUse and NWPU-RESISC45 to verify effectiveness of the proposed method. Compared to state-of-art methods, the proposed method performs better in quantitative measurement. The proposed method could greatly improve the accuracy and achieve overall accuracies of 90.74% and 87.21% on the two datasets even with 10% data used for training. Conclusion With the idea of transfer learning, the features of image extracted by the deep convolutional neural networks are highly abstract and semantic, which have better ability in classification than other handcraft features. Through feature fusion and model transfer, the advantages of different features and classification methods could be synthetically utilized. In this way, higher classification accuracy could be achieved even with very little training data.