VARIOUS WEIGHTING SCHEMES FOR IMBALANCED REGRESSION

2025-9-1
Tan, Hüseyin
Machine learning algorithms frequently perform poorly in areas with a lack of data points in regression modeling, especially when data conditions are imbalanced. These occurrences could be significant depending on the field, such as e-commerce, health, or banking. Rare, however important, occurrences are underrepresented during model training because of the underlying structure of many algorithms, which favors regions with higher data densities. The goal of this research is to increase predicted accu- racy in these areas by examining relevance-based weighting techniques and objective functions. Using three popular regression algorithms, Random Forest, XGBoost, and LightGBM, several objective functions, Mean Squared Error (MSE), Mean Squared Logarithmic Error (MSLE), and Quantile, were examined in the first stage. Find- ing the best model architecture and loss function combination for handling imbal- anced regression settings was the goal of this phase. The XGBoost and LightGBM models were the focus of the investigation in the following phase, when different relevance-based weighting methods were used to boost sensitivity toward rare target values. Among the weighing methods examined are quantile-based weighting, order- statistics-derived weights, interpolation-based relevance functions, and a density-based scheme called DenseLoss Weight. The success of each approach to steer the model toward improved performance in rare value locations was assessed. Neighbourhood- Total metric was suggested in addition to the standard evaluation metrics of MSE, MAE, and MAPE to capture localized performance in crucial areas that global er- ror measures could otherwise miss. When combined with suitable loss functions and algorithms, the experimental results show that relevance-based weighting functions significantly improve the model’s capacity to predict rare occurrences without com- promising overall performance.
Citation Formats
H. Tan, “VARIOUS WEIGHTING SCHEMES FOR IMBALANCED REGRESSION,” M.S. - Master of Science, Middle East Technical University, 2025.