首页 > , Vol. , Issue () : -
At present, the situation of environmental pollution in China is grim, among which the regional compound air pollution dominated by PM2.5 is the most prominent. Aerosol Optical Depth (AOD) is a key physical quantity to characterize the degree of atmospheric turbidity, which represents the intensity of aerosol"s light reduction. For a long time, a large number of studies have shown that there is a strong correlation between AOD and PM2.5. Using the AOD data obtained by satellite remote sensing combined with other influencing factors to analyze the change mechanism of PM2.5 is of great significance to air pollution prevention and protection of human health. The diffusion of PM2.5 is an extremely complicated process and the PM2.5 prediction model based on statistical regression method can only describe a relatively simple nonlinear relationship. However, the estimation of PM2.5 is considered to be a more complex multivariable nonlinear problem. Compared with the statistical regression models, the PM2.5 prediction model based on traditional machine learning algorithm can deal with more complex nonlinear problems. But its processing ability to historical data is still limited, so it is difficult to mine the variation law of pollutant concentration from the perspective of big data. Compared with the traditional machine learning method, the models based on deep learning can dig deep features hidden in historical data. However, the AOD remote sensing data is affected by image time resolution and pixel cloud pollution, which will greatly reduce the effective data. Because the construction of deep learning method depends on a large number of training data, the less training data will seriously affect the model accuracy. Aiming at the problem that the traditional machine learning algorithm can"t deeply mine the hidden association features in data and the deep learning algorithm has poor effect under the condition of less data, a combined model of PM2.5 prediction based on deep learning and random forest is proposed. The model builds a training data set with AOD remote sensing data, meteorological reanalysis data and PM2.5 ground observation data. The deep hidden features in the training data are extracted by the powerful feature extraction ability of the deep learning model first, and then the extracted hidden features are used in the training of the random forest model, and the predicted value of PM2.5 concentration is obtained by the random forest regression algorithm. In order to verify the effectiveness of this method, a series of experiments were carried out. The results demonstrate that PMCOM have better prediction accuracy in both overall prediction and seasonal prediction scenarios. The combination of random forest and long and short-term memory neural network is the best for this experiment. Even when only 35% data used for training, R2 in the overall prediction experiment can reach 0.89, and R2 in each season prediction experiment is also above 0.75. The combination of deep learning and random forest can reduce the dependence of deep learning models on the amount of data by random forest and make full use of the high-level hidden features of existing historical data. In this way, it makes up for the deficiency of mining the internal associated features of data by random forest model and improves the prediction accuracy of PM2.5 concentration.