Anomaly detection via iterative Jackknife scenario based on data dimension reduction technique - JACKerDIM

2025-1
Güngör, Erdem
With the importance of data-based decision-making developing policies, the reliability and consistency of the data sets are of great significance for thorough investigation. To turn these data sets into a valuable product, data preprocessing is the first step to preparing the data for analysis. In this study, we aim to improve a novel anomaly detection method to clean the data. The proposed anomaly detection method is based on Jackknife (JA), which provides a Leave-One-Out (LOO) scenario. This strategy enables us to resample the data set n (the number of data instances) times by removing each data instance from the data. In the second step, Principal Component Analysis (PCA) is applied to handle the curse of the high dimensionality and multicollinearity problem. In the last step, we suggest an interval based on the Box Plot outlier detection range called the Moving Anomaly Detection Range Interval (MADRI), in which the parameters of the study method are included. Through those strategies, we aim to assess the outlierness of data points by considering the possibility of associations across the features. The study contributes to the literature by being the first study covering the methods of LOO, PCA, and MADRI to detect outliers. The proposed method is tested through simulations against well-known anomaly detection methods. Among the 29 data sets tested, our method reveals a consistent efficacy in identifying outliers. Moreover, our method shows the most significant improvement with high accuracy rates for the Pen Digits, Letter Recognition, Waveform Database, and NSL-KDD data sets.
Citation Formats
E. Güngör, “Anomaly detection via iterative Jackknife scenario based on data dimension reduction technique - JACKerDIM,” Ph.D. - Doctoral Program, Middle East Technical University, 2025.