Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
INSURANCE FRAUD DETECTION VIA MACHINE LEARNING AND EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI)
Download
Cihan Tezel.pdf
Yayımlama, Fikri Mülkiyet Hakları ve Doğruluk Beyanı Cihan Tezel.pdf
Date
2025-9-1
Author
Tezel, Cihan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
3517
views
0
downloads
Cite This
This thesis studies insurance fraud detection with machine learning (ML) methods and explainable artificial intelligence (XAI) tools. The dataset contains insurance policies and their related claims, where fraud is marked as a binary variable. The preparation stage covers missing data handling, encoding of categorical fields, and several feature engineering steps. Since fraud cases are rare, the Synthetic Minority Oversampling Technique (SMOTE) is employed, and models are trained on both the original and the balanced data. The algorithms tested include Logistic Regression, Random Forest, Support Vector Machines (SVM), k-Nearest Neighbors (KNN), LightGBM, and XGBoost. For XGBoost, two variants are considered: one with manually designed features such as seasonality, polynomial terms, and interactions, and another with automated feature engineering tuned by cross-validated AUC. Model performance is evaluated with ROC-AUC, accuracy, recall, specificity, and balanced accuracy, with thresholds optimized via ROC analysis. In addition, DeLong tests are applied to assess whether the observed differences in AUC across models are statistically significant. For interpretability, SHAP (SHapley Additive exPlanations) is utilized. The results indicate that liability coverage and fault assignment are the strongest drivers of fraud detection. Timing and policyholder features, such as claim month and address changes, also contribute meaningfully, though at a lower level. The findings showed that boosting methods achieved the best overall performance. LightGBM and XGBoost outperformed the other models, with XGBoost slightly ahead on the SMOTE-balanced data (AUC ≈ 0.85, recall ≈ 90%). Both manual and automated feature engineering improved recall, thereby shifting the trade-off toward catching more fraud cases. These findings emphasized the importance of designing fraud detection models that combined strong predictive accuracy with transparent and interpretable decision-making.
Subject Keywords
Insurance Fraud Detection, Machine Learning, XGBoost, SHAP, Imbalanced Data, Explainable Artificial Intelligence (XAI)
URI
https://hdl.handle.net/11511/115659
Collections
Graduate School of Applied Mathematics, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
C. Tezel, “INSURANCE FRAUD DETECTION VIA MACHINE LEARNING AND EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI),” M.S. - Master of Science, Middle East Technical University, 2025.