首页 > , Vol. , Issue () : -
With the rapid development of artificial intelligence (AI) technology such as machine learning and deep learning in remote sensing field, data-driven models have become a new research paradigm for automatic information retrieval from remote sensing imagery, which calls for higher requirements for the quantity, quality and diversity of sample datasets. Before the era of deep learning, since classical machine learning methods (e.g., support vector machine, random forest, etc.) do not require huge numbers of samples for model training, the previously published sample datasets usually have a relatively small size (i.e., less than one hundred). In recent years, with the rapid development of technologies such as big data, parallel computing and deep learning, many scholars and research institutions have issued a series of sample datasets, which has laid a solid foundation for a wide range of researches and applications such as scene understanding, semantic segmentation, and object detection from remote sensing images. However, there is still lack of a comprehensive review of the recently published sample datasets for remote sensing image analysis under the context of big data and deep learning. Therefore, the objective of this study is to summarize, analyze these datasets to provide valuable data reference for relevant researchers. On the basis of literature retrieval and analysis, this paper summarized a total of 124 widely used, open access and influential remote sensing image sample datasets which were published between 2001 and 2020. We reviewed and summarized the development of recently published sample datasets for remote sensing imagery based on metadata analysis from the following aspects, such as data sources, application fields, keywords and data size, etc. Afterwards, we also analyzed these sample datasets from the perspective of both spatial, spectral and temporal resolutions. Meanwhile, we listed the commonly used deep learning models (i.e., convolutional neural networks, recurrent neural networks, generative adversarial networks, etc.) in remote sensing field to show how these sample datasets could be used. Furthermore, we divided the remote sensing image sample datasets into eight categories based on the following application fields: scene recognition, land cover/land use classification, thematic information extraction, change detection, ground-object detection, semantic segmentation, quantitative remote sensing and other applications. Both the typical datasets and related research progress have been carefully reviewed for each application filed. In addition, since deep learning models are data-hunger in nature, how to train a model with good generalization capability under limited labeled data has become a significant issue, especially for remote sensing applications since it is time-consuming to get sufficient labeled samples. To tackle this issue, we discussed several methods that could increase the model’s generalization capability, including sample transfer between spatio-temporal domains, few-shot learning and zero-shot learning, active learning and semi-supervised learning for sample discovery, and sample generation through generative adversarial networks. By means of the multi-dimensinonal analysis, we give a comprehensive overview of remote sensing image sample datasets. To the best of our knowledge, this paper is the first review of remote sensing image sample datasets for deep learning, which can provide data reference for researchers in related fields.