首页 >  , Vol. , Issue () : -

摘要

全文摘要次数: 186 全文下载次数: 480
引用本文:

DOI:

10.11834/jrs.20210504

收稿日期:

2020-11-09

修改日期:

2021-06-03

PDF Free   EndNote   BibTeX
一种基于深度学习与随机森林的PM2.5浓度预测模型
彭豪杰1, 周 杨1, 胡校飞1, 张 龙2, 彭杨钊1, 蔡心悦1
1.信息工程大学 地理空间信息学院;2.北京遥感信息研究所
摘要:

针对PM2.5浓度预测中传统机器学习算法无法对数据内部隐藏特征进行深层次挖掘,而深度学习算法在数据较少情况下效果不佳的问题,综合考虑深度学习与随机森林的特点,提出一种基于深度学习与随机森林的PM2.5浓度预测组合模型。模型以气溶胶光学厚度(AOD)遥感数据、气象再分析数据和PM2.5地面观测数据构建训练数据集,通过深度学习方法对训练数据内部深层次隐含特征进行提取,将提取得到的隐含特征用于随机森林模型训练,并使用随机森林回归算法得到PM2.5浓度的预测值。为验证方法的有效性,以河南省区域2018~2019年的PM2.5浓度估算为例,将原始特征与利用CNN、LSTM和CNN_LSTM所提取特征共同构建的新特征分别通过随机森林回归、支持向量回归以及K近邻回归等三种传统机器学习方法进行训练和预测。实验结果表明,在较少数据情况下PMCOM模型无论是在整体预测还是在分季节预测场景下均具有较好的预测精度,其中以LSTM为特征选择器,RF为回归器的组合模型是本实验的最优模型,在即使只有35%的数据作为训练样本时,整体预测实验中R2仍可达0.89,各季节预测实验中R2均在0.75以上。

A PM2.5 Prediction Model Based on Deep Learning and Random Forest
Abstract:

At present, the situation of environmental pollution in China is grim, among which the regional compound air pollution dominated by PM2.5 is the most prominent. Aerosol Optical Depth (AOD) is a key physical quantity to characterize the degree of atmospheric turbidity, which represents the intensity of aerosol"s light reduction. For a long time, a large number of studies have shown that there is a strong correlation between AOD and PM2.5. Using the AOD data obtained by satellite remote sensing combined with other influencing factors to analyze the change mechanism of PM2.5 is of great significance to air pollution prevention and protection of human health. The diffusion of PM2.5 is an extremely complicated process and the PM2.5 prediction model based on statistical regression method can only describe a relatively simple nonlinear relationship. However, the estimation of PM2.5 is considered to be a more complex multivariable nonlinear problem. Compared with the statistical regression models, the PM2.5 prediction model based on traditional machine learning algorithm can deal with more complex nonlinear problems. But its processing ability to historical data is still limited, so it is difficult to mine the variation law of pollutant concentration from the perspective of big data. Compared with the traditional machine learning method, the models based on deep learning can dig deep features hidden in historical data. However, the AOD remote sensing data is affected by image time resolution and pixel cloud pollution, which will greatly reduce the effective data. Because the construction of deep learning method depends on a large number of training data, the less training data will seriously affect the model accuracy. Aiming at the problem that the traditional machine learning algorithm can"t deeply mine the hidden association features in data and the deep learning algorithm has poor effect under the condition of less data, a combined model of PM2.5 prediction based on deep learning and random forest is proposed. The model builds a training data set with AOD remote sensing data, meteorological reanalysis data and PM2.5 ground observation data. The deep hidden features in the training data are extracted by the powerful feature extraction ability of the deep learning model first, and then the extracted hidden features are used in the training of the random forest model, and the predicted value of PM2.5 concentration is obtained by the random forest regression algorithm. In order to verify the effectiveness of this method, a series of experiments were carried out. The results demonstrate that PMCOM have better prediction accuracy in both overall prediction and seasonal prediction scenarios. The combination of random forest and long and short-term memory neural network is the best for this experiment. Even when only 35% data used for training, R2 in the overall prediction experiment can reach 0.89, and R2 in each season prediction experiment is also above 0.75. The combination of deep learning and random forest can reduce the dependence of deep learning models on the amount of data by random forest and make full use of the high-level hidden features of existing historical data. In this way, it makes up for the deficiency of mining the internal associated features of data by random forest model and improves the prediction accuracy of PM2.5 concentration.

本文暂时没有被引用!

欢迎关注学报微信

遥感学报交流群 分享按钮