A cluster tree based model selection approach for logistic regression classifier

2018-01-01
Model selection methods are important to identify the best approximating model. To identify the best meaningful model, purpose of the model should be clearly pre-stated. The focus of this paper is model selection when the modelling purpose is classification. We propose a new model selection approach designed for logistic regression model selection where main modelling purpose is classification. The method is based on the distance between the two clustering trees. We also question and evaluate the performances of conventional model selection methods based on information theory concepts in determining best logistic regression classifier. An extensive simulation study is used to assess the finite sample performances of the cluster tree based and the information theoretic model selection methods. Simulations are adjusted for whether the true model is in the candidate set or not. Results show that the new approach is highly promising. Finally, they are applied to a real data set to select a binary model as a means of classifying the subjects with respect to their risk of breast cancer.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

Suggestions

Bayesian semiparametric models for nonignorable missing mechanisms in generalized linear models
Kalaylıoğlu Akyıldız, Zeynep Işıl (Informa UK Limited, 2013-08-01)
Semiparametric models provide a more flexible form for modeling the relationship between the response and the explanatory variables. On the other hand in the literature of modeling for the missing variables, canonical form of the probability of the variable being missing (p) is modeled taking a fully parametric approach. Here we consider a regression spline based semiparametric approach to model the missingness mechanism of nonignorably missing covariates. In this model the relationship between the suitable...
Extended lasso-type MARS (LMARS) model in the description of biological network
Agraz, Melih; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2019-01-02)
The multivariate adaptive regression splines (MARS) model is one of the well-known, additive non-parametric models that can deal with highly correlated and nonlinear datasets successfully. From our previous analyses, we have seen that lasso-type MARS (LMARS) can be a strong alternative of the Gaussian graphical model (GGM) which is a well-known probabilistic method to describe the steady-state behaviour of the complex biological systems via the lasso regression. In this study, we extend our original LMARS m...
Multiple linear regression model with stochastic design variables
İslam, Muhammed Qamarul (Informa UK Limited, 2010-01-01)
In a simple multiple linear regression model, the design variables have traditionally been assumed to be non-stochastic. In numerous real-life situations, however, they are stochastic and non-normal. Estimators of parameters applicable to such situations are developed. It is shown that these estimators are efficient and robust. A real-life example is given.
Models of response error components in supervised interview-reinterview surveys
Ayhan, Hüseyin Öztaş (Informa UK Limited, 2003-11-01)
The current work deals with modelling of response error components in supervised interview-reinterview surveys. The model considers several stages of an interactive process to obtain and record a response. The response process is evaluated as, controller-interviewer-respondent-interviewer-controller interaction setting under a supervised interviewing process. The allocation of controllers, interviewers and respondents is made by a hierarchical design for the interview-reinterview process. In addition, a cod...
A marginalized multilevel model for bivariate longitudinal binary data
Inan, Gul; İlk Dağ, Özlem (Springer Science and Business Media LLC, 2019-06-01)
This study considers analysis of bivariate longitudinal binary data. We propose a model based on marginalized multilevel model framework. The proposed model consists of two levels such that the first level associates the marginal mean of responses with covariates through a logistic regression model and the second level includes subject/time specific random intercepts within a probit regression model. The covariance matrix of multiple correlated time-specific random intercepts for each subject is assumed to ...
Citation Formats
O. Tanju and Z. I. Kalaylıoğlu Akyıldız, “A cluster tree based model selection approach for logistic regression classifier,” JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, pp. 1394–1414, 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/36456.