Insurance Fraud Detection via Clustering-Based Fuzzy Classification On Noisy Unbalanced Datasets

2024-09-09
One of the most important challenges in overcoming the unsystematic risks in the insurance industry is fraud detection as the expenses associated with it can be disastrous, and can increase loading on reserves and premiums. Fraud detection may necessitate the consideration of several elements and variables due to its diverse character. Scoring systems become a valuable tool for discovering logical relationships between several parameters, highlighting their differences, estimating risks or probabilities, and predicting the likelihood of fraud. To determine the true nature of fraud, we propose a clustering-based fuzzy classification with a noise cluster (CBFCN). This research presents a strategy based on fuzzy k-means clustering with a noise cluster (FKMN) as a novel method for robust clustering and outlier identification. In order to determine accurately the contributing characteristics, we combine fuzzy theory to improve the prediction capacity of the machine learning (ML) techniques. The implementation of CBFCN has two key components. The membership values derived from the FKMN clustering algorithm which aims to better capture the behavior of an existing structure and identify noise (extremes) within the dataset. To illustrate how CBFCN performs in identifying the fraud in comparison to the traditional ones, two datasets disclosing various features in their variables are studied. Moreover, the use of noise clusters elaborates the fuzzy technique to enhance the ML performance. The results show that the proposed CBFCN models generate promising classification outcomes for the identification of fraud in insurance claims events. Additionally, the modification of clustering based fuzzy classification by adding noise cluster and implement its inference utilization on fraud in automobile and health insurance aims to increase prediction ability of ML methods on imbalanced noisy datasets. The proposed study based on work of C ¸elikyılmaz and Türkşen which utilize Logistic Regression and Support Vector Machines as the guiding implementation.Baser et. al. enhance CBFCframework by employing Artificial Neural Network, Random Forest, Decision Tree, k-Nearest Neighbor, Gaussian Naive Bayes, Light Gradient Boosting Machine, CatBoost, and Extreme Gradient Boosting within the CBFCframework to measure credit default risk. In this study, we aim to improve the CBFC approach of Baser et al. by increasing robustness against noise in the clustering scheme with the FKMN framework and suggest CBFC with a noise.
the European Actuarial Journal Conference (EAJC) 2024
Citation Formats
O. Koç, F. Başer, and S. A. Kestel, “Insurance Fraud Detection via Clustering-Based Fuzzy Classification On Noisy Unbalanced Datasets,” presented at the the European Actuarial Journal Conference (EAJC) 2024, Lisbon, Portekiz, 2024, Accessed: 00, 2025. [Online]. Available: https://cemapre.iseg.ulisboa.pt/eajc2024/BoA.pdf.