The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring

Banks utilize credit scoring as an important indicator of financial strength and eligibility for credit. Scoring models aim to assign statistical odds or probabilities for predicting if there is a risk of nonpayment in relation to many other factors which may be involved in. This paper aims to illustrate the beneficial use of the eight machine learning (ML) methods (Support Vector Machine, Gaussian Naive Bayes, Decision Trees, Random Forest, XGBoost, K-Nearest Neighbors, Multi-layer Perceptron Neural Networks) and Logistic Regression in finding the default risk as well as the features contributing to it. An extensive comparison is made in three aspects: (i) which ML models with and without its own wrapper feature selection performs the best; (ii) how feature selection combined with appropriate data scaling method influences the performance; (iii) which of the most successful combination (algorithm, feature selection, and scaling) delivers the best validation indicators such as accuracy rate, Type I and II errors and AUC. An open-access credit scoring default risk data sets on German and Australian cases are taken into account, for which we determine the best method, scaling, and features contributing to default risk best and compare our findings with the literature ones in related. We illustrate the positive contribution of the selection method and scaling on the performance indicators compared to the existing literature.


Credit Risk Evaluation Using Clustering Based Fuzzy Classification Method
Koç, Oğuz; Başer, Furkan; Kestel, Sevtap Ayşe (2023-03-01)
Credit scoring is a crucial indicator for banks to determine the financial position and the eligibility of aclient for credit. In order to assign statistical odds or probabilities to predict the risk of nonpayment inrelation to many other factors, the scoring criterion becomes an important issue. The focus of thisstudy is to propose a clustering based fuzzy classification (CBFC) method for credit risk assessment. Weaim to illustrate the beneficial use of machine learning (ML) methods whose prediction power ...
The Impact of credit rating changes on the government cost of borrowing in Turkey
Gürer, Murat; Derin Güre, Pınar; Department of Economics (2014)
Standard and Poor’s (S&P), Moody’s and Fitch have been producing credit ratings for government bonds and corporate bonds. Changes in credit ratings affect the investors’ decisions and government cost of borrowing as well. 2008 global financial crisis is an important milestone for the credit rating agencies since during the crisis period high rated countries faced with deep economic fluctuations which decreased the creditworthiness of these agencies. This thesis investigates the relationship between sovereig...
The Effects of different types of credit growth in developing countries in comparison to developed countries
Uçar, Ezgi; Cömert, Hasan; Department of Economics (2019)
This thesis aims to identify the relationship between credit booms and banking crises. Credit is disaggregated into credit to non-financial corporations and credit to households and non-profit institutions serving households. The analysis covers 10 developing and 10 developed countries between 1994 and 2017. Method of Mendoza and Terrones (2008) is followed in identification of booms. Signal extraction analysis is employed to identify the most appropriate smoothing parameter and threshold coefficient. 1600 ...
The Valuation of government guarantees provided for municipalities
Mert, Mehmet Esat; Duran, Serhan; İyigün, Cem; Department of Industrial Engineering (2015)
Credit risk is defined as the risk of portfolio value variations due to unforeseeable fluctuations in the credit quality of a party in a financial contract. The operations that create receivable and contingent liability are the basic sources of credit risk. Credit risk models are needed in order to quantify the risk related to these sources better and minimize them by monitoring regularly. Although credit risk models are widely used in private sector, there are also usage areas for various operations of the...
A Comparative study for nonlinear structure of the interest rate pass through
Değer, Osman; Yıldırım Kasap, Dilem; Department of Economics (2012)
This study investigates the interest rate pass through from the money market rate to the lending rate by utilizing monthly data of fifteen countries, grouped as high income, upper middle income and lower middle income, over the period 1999:01-2011:09. Taking the linear cointegration test of Engle-Granger as benchmark, we employ threshold cointegration tests of Enders and Siklos (2001) in order to account for the possible nonlinearities in the pass-through process. Empirical results reveal that the pass thro...
Citation Formats
O. Koç, Ö. Uğur, and S. A. Kestel, “The Impact of Feature Selection and Transformation on Machine Learning Methods in Determining the Credit Scoring,” arXiv, vol. 2303, no. 5427, pp. 2–14, 2023, Accessed: 00, 2023. [Online]. Available: