Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
VARIOUS WEIGHTING SCHEMES FOR IMBALANCED REGRESSION
Download
Hüseyin-Tan-Thesis-.pdf
HÜSEYİN TAN.pdf
Date
2025-9-1
Author
Tan, Hüseyin
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
86
views
0
downloads
Cite This
Machine learning algorithms frequently perform poorly in areas with a lack of data points in regression modeling, especially when data conditions are imbalanced. These occurrences could be significant depending on the field, such as e-commerce, health, or banking. Rare, however important, occurrences are underrepresented during model training because of the underlying structure of many algorithms, which favors regions with higher data densities. The goal of this research is to increase predicted accu- racy in these areas by examining relevance-based weighting techniques and objective functions. Using three popular regression algorithms, Random Forest, XGBoost, and LightGBM, several objective functions, Mean Squared Error (MSE), Mean Squared Logarithmic Error (MSLE), and Quantile, were examined in the first stage. Find- ing the best model architecture and loss function combination for handling imbal- anced regression settings was the goal of this phase. The XGBoost and LightGBM models were the focus of the investigation in the following phase, when different relevance-based weighting methods were used to boost sensitivity toward rare target values. Among the weighing methods examined are quantile-based weighting, order- statistics-derived weights, interpolation-based relevance functions, and a density-based scheme called DenseLoss Weight. The success of each approach to steer the model toward improved performance in rare value locations was assessed. Neighbourhood- Total metric was suggested in addition to the standard evaluation metrics of MSE, MAE, and MAPE to capture localized performance in crucial areas that global er- ror measures could otherwise miss. When combined with suitable loss functions and algorithms, the experimental results show that relevance-based weighting functions significantly improve the model’s capacity to predict rare occurrences without com- promising overall performance.
Subject Keywords
Regression imbalance data
,
Order statistics
,
Machine Learning
,
Prediction
,
Banking
URI
https://hdl.handle.net/11511/116145
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Tan, “VARIOUS WEIGHTING SCHEMES FOR IMBALANCED REGRESSION,” M.S. - Master of Science, Middle East Technical University, 2025.