Classification of Imbalanced Credit Data Sets with Borrower-Specific Cost-Sensitive Algorithms

Download

Phd_Thesis_YaseminYK.pdf

Date

2023-6-02

Author

Yaman Kanmaz, Yasemin.

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

806
views

134
downloads

The unequal class distributions result in two types of prediction errors that incur different costs in imbalanced credit data sets. These are monetary losses for the misclassified defaults and opportunity cost of interest income for the misclassified non-defaults. Addressing these issues, this study proposes a novel approach to costsensitive learning and imbalanced data classification in credit data sets, using new borrower (instance)-specific cost/risk parameters to solve these two types of asymmetries. The main objective of this study is to create a weight-signaling risk level for each instance by revealing instance-embedded information to strengthen ordinary algorithms with the generated weight and breaking the dominance of the majority class in the loss functions. The default probabilities of credit applicants provide valuable information about their risk level, and thus new instance-specific cost/risk parameters based on their default risk levels are proposed instead of class-specific ratios. Default probabilities are estimated with sampled sub-datasets, and before this step, analyses for the optimal class ratio of sub-datasets are conducted with the Simulated Annealing stochastic process. To estimate the default probabilities, non-linear complex models like logistic regressions, deep learning-based Graph Neural Networks, and Graph Attention Networks are employed. Three cost/risk parameters are generated with the target of equalizing the class losses based on their class-based default risk level aggregations. AdaBoost, XGBoost, and ANN algorithms are then modified to incorporate these new parameters and the empirical analyses are conducted using eight credit data sets. The success of the proposed algorithms is particularly evident in the classification of data sets where the class ratios increase. The comparison analyses indicate that given Specificity values, the decrease in the monetary loss by new cost-sensitive algorithms can reach 33.7 % in the data set with the highest class imbalance.

Subject Keywords

Instance-specific, Default probability, Logistic regression, Graph neural networks, Graph attention networks, Articificial neural networks, XGBoost, AdaBoost

URI

https://hdl.handle.net/11511/104424

Collections

Graduate School of Applied Mathematics, Thesis

Citation Formats

Y. Yaman Kanmaz, “Classification of Imbalanced Credit Data Sets with Borrower-Specific Cost-Sensitive Algorithms,” Ph.D. - Doctoral Program, Middle East Technical University, 2023.