Cluster based model diagnostic for logistic regression

Tanju, Özge
Model selection methods are commonly used to identify the best approximation that explains the data. Existing methods are generally based on the information theory, such as Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Consistent Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC). These criteria do not depend on any modeling purposes. In this thesis, we propose a new method for logistic regression model selection where the modeling purpose is classification. This method is based on a measure of distance between two clusterings. There are many clustering similarity measures in the literature. Our model selection procedure is based on Jaccard index (Downton and Brennan, 1980) and Fowlkes-Mallows Index (Fowlkes and Mallows, 1983). The new model selection approach is compared against the currently used common methods in an extensive simulation study concerned with many different realistic scenarios. Scenarios are divided into two based on modeling purposes. Simulation scenarios are also grouped whether the true model is in the candidate models or not. We consider linear and nonlinear logistic models which are nested and non-nested, random-effects and fixed-effects models as true models. Simulation results show that the new method is highly promising. Apart from the new method, this thesis also provides an extensive comparison of the current methods based on information criteria. Finally, cluster based and information based criteria are applied to a real data set to select a binary model.
Citation Formats
Ö. Tanju, “Cluster based model diagnostic for logistic regression,” M.S. - Master of Science, Middle East Technical University, 2016.