Comparative performance analysis of variable selection methods in linear models: A full factorial design simulation study

2024-7-30
Bi, Mehmet
Variable selection is an important preprocessing step in statistical modeling, aimed at improving model performance by identifying the most relevant variables. Despite the abundance of variable selection techniques, there remains a gap in understanding their comparative effectiveness across diverse datasets and conditions. Therefore, in this study we systematically evaluate a wide range of variable selection methods, covering all types of methods, filter, wrapper, and embedded with widely known methods. By employing a full factorial design (64 scenarios), we examine the interactions between different factors and various dataset characteristics, such as sample size, number of variables, and variable correlation, error and outlier. This robust experimental framework allows for an in-depth assessment of each method performance, considering multiple evaluation metrics including accuracy, test and train error. The results reveal significant insights into the strengths and limitations of each variable selection method, providing practical guidance for practitioners in choosing the most appropriate technique for their specific applications. Furthermore, the findings highlight the importance of context-dependent method selection, emphasizing that no single variable selection method universally outperforms others across all scenarios. Among selected variable selection methods, results revealed Least Absolute Shrinkage and Selection Operator (LASSO), Forward Feature Selection and Recursive Feature Elimination (RFE) are the suggested candidates depending on the data characteristics. Overall, this study contributes to the field of statistics by offering a case-specisific manual and a thorough statistical evaluation of variable selection methods, thereby aiding in the development of more efficient and accurate predictive models.
Citation Formats
M. Bi, “Comparative performance analysis of variable selection methods in linear models: A full factorial design simulation study,” M.S. - Master of Science, Middle East Technical University, 2024.