Classification of Imbalanced Credit Data Sets with Borrower-Specific Cost-Sensitive Algorithms

2023-6-02
Yaman Kanmaz, Yasemin.
The unequal class distributions result in two types of prediction errors that incur different costs in imbalanced credit data sets. These are monetary losses for the misclassified defaults and opportunity cost of interest income for the misclassified non-defaults. Addressing these issues, this study proposes a novel approach to costsensitive learning and imbalanced data classification in credit data sets, using new borrower (instance)-specific cost/risk parameters to solve these two types of asymmetries. The main objective of this study is to create a weight-signaling risk level for each instance by revealing instance-embedded information to strengthen ordinary algorithms with the generated weight and breaking the dominance of the majority class in the loss functions. The default probabilities of credit applicants provide valuable information about their risk level, and thus new instance-specific cost/risk parameters based on their default risk levels are proposed instead of class-specific ratios. Default probabilities are estimated with sampled sub-datasets, and before this step, analyses for the optimal class ratio of sub-datasets are conducted with the Simulated Annealing stochastic process. To estimate the default probabilities, non-linear complex models like logistic regressions, deep learning-based Graph Neural Networks, and Graph Attention Networks are employed. Three cost/risk parameters are generated with the target of equalizing the class losses based on their class-based default risk level aggregations. AdaBoost, XGBoost, and ANN algorithms are then modified to incorporate these new parameters and the empirical analyses are conducted using eight credit data sets. The success of the proposed algorithms is particularly evident in the classification of data sets where the class ratios increase. The comparison analyses indicate that given Specificity values, the decrease in the monetary loss by new cost-sensitive algorithms can reach 33.7 % in the data set with the highest class imbalance.
Citation Formats
Y. Yaman Kanmaz, “Classification of Imbalanced Credit Data Sets with Borrower-Specific Cost-Sensitive Algorithms,” Ph.D. - Doctoral Program, Middle East Technical University, 2023.