Machine learning-based approach for bias correction of satellite-based precipitation products using environmental parameters and ground truth data in Turkiye

2024-9-06
Sevinç, Gökhan
Satellite precipitation data are very important in hydrological studies, but contain bias. In this study, XGBoost and Random Forest machine learning algorithms are used to correct the bias with ground observations and environmental parameters such as distance to the coast and elevation. The machine learning models were trained daily from 2015 to 2022 with optimal hyperparameters to obtain the most accurate and robust results and results are filtered to be more representative on rainy days. Although machine learning models are generally considered as black box, SHAP values were utilized in this study in an effort to explain and interpret their behavior by showing the contribution of each feature to the model prediction and how these contributions change as a function of space and time. The performance of the models was examined using different metrics to clearly explain their strengths and weaknesses. Average RMSE scores of filtered IMERG (7.08), Random Forest (4.00), and XGBoost (4.33) showing machine learning models provide a much more accurate prediction because they reduce the average RMSE of filtered IMERG by about 3 mm/day. Average KGE scores of filtered IMERG (-0.28), Random Forest (0.46) and XGBoost (0.47) and their positive improvements of KGE values indicating machine learning models perform better in capturing precipitation variability and accuracy of predictions. The Average Mean Bias Error scores of filtered IMERG (0.197) indicates, overestimation of observations, while Random Forest (-0.068) and XGBoost (-0.071) models slightly underestimate the observed values. These results shows that the accuracy and reliability of the prediction performance are improved. It was found that XGBoost models are better to capture variability in data and predicting extreme precipitation events (10 mm/day or higher events). Random forest is better at predicting lower threshold events such as 1 mm/day and 2 mm/day. The overall behavior of the models is visualized by merging their daily SHAP values. Machine learning models are consistent with their feature importance scores (FI) each year and adapt their behavior seasonally. The SHAP analysis further emphasizes that the models successfully capture the aridity over the Mediterranean and Central Anatolian regions by providing low summer precipitation at these latitudes, while positive SHAP values at higher latitudes in summer translate into increased precipitation in the Black Sea region. The clear positive correlation of precipitation with elevation is evident in the models, while the effect of distance from the coast in summer is minimal due to generally dry climatic conditions. The SHAP analysis also shows that the models capture the high winter precipitation in the Mediterranean and Black Sea regions, as well as the dry conditions of the Central Anatolian Plateau. In addition, the models show a strong seasonal influence of the distance to coast feature on precipitation, with a superior ability to capture coastal precipitation in winter, demonstrating their ability to adapt to seasonality.
Citation Formats
G. Sevinç, “Machine learning-based approach for bias correction of satellite-based precipitation products using environmental parameters and ground truth data in Turkiye,” M.S. - Master of Science, Middle East Technical University, 2024.