Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Cluster based model diagnostic for logistic regression
Download
index.pdf
Date
2016
Author
Tanju, Özge
Metadata
Show full item record
Item Usage Stats
201
views
67
downloads
Cite This
Model selection methods are commonly used to identify the best approximation that explains the data. Existing methods are generally based on the information theory, such as Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Consistent Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC). These criteria do not depend on any modeling purposes. In this thesis, we propose a new method for logistic regression model selection where the modeling purpose is classification. This method is based on a measure of distance between two clusterings. There are many clustering similarity measures in the literature. Our model selection procedure is based on Jaccard index (Downton and Brennan, 1980) and Fowlkes-Mallows Index (Fowlkes and Mallows, 1983). The new model selection approach is compared against the currently used common methods in an extensive simulation study concerned with many different realistic scenarios. Scenarios are divided into two based on modeling purposes. Simulation scenarios are also grouped whether the true model is in the candidate models or not. We consider linear and nonlinear logistic models which are nested and non-nested, random-effects and fixed-effects models as true models. Simulation results show that the new method is highly promising. Apart from the new method, this thesis also provides an extensive comparison of the current methods based on information criteria. Finally, cluster based and information based criteria are applied to a real data set to select a binary model.
Subject Keywords
Cluster analysis.
,
Regression analysis.
,
Logistic regression analysis.
URI
http://etd.lib.metu.edu.tr/upload/12620091/index.pdf
https://hdl.handle.net/11511/25751
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Clustering of manifold-modeled data based on tangent space variations
Gökdoğan, Gökhan; Vural, Elif; Department of Electrical and Electronics Engineering (2017)
An important research topic of the recent years has been to understand and analyze data collections for clustering and classification applications. In many data analysis problems, the data sets at hand have an intrinsically low-dimensional structure and admit a manifold model. Most state-of-the-art clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate diff...
CMARS: a new contribution to nonparametric regression with multivariate adaptive regression splines supported by continuous optimization
Weber, Gerhard-Wilhelm; Batmaz, İnci; Köksal, Gülser; Taylan, Pakize; Yerlikaya-Ozkurt, Fatma (2012-01-01)
Regression analysis is a widely used statistical method for modelling relationships between variables. Multivariate adaptive regression splines (MARS) especially is very useful for high-dimensional problems and fitting nonlinear multivariate functions. A special advantage of MARS lies in its ability to estimate contributions of some basis functions so that both additive and interactive effects of the predictors are allowed to determine the response variable. The MARS method consists of two parts: forward an...
Consensus clustering of time series data
Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014)
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
Estimation and hypothesis testing in stochastic regression
Sazak, Hakan Savaş; Tiku, Moti Lal; İslam, Qamarul; Department of Statistics (2003)
Regression analysis is very popular among researchers in various fields but almost all the researchers use the classical methods which assume that X is nonstochastic and the error is normally distributed. However, in real life problems, X is generally stochastic and error can be nonnormal. Maximum likelihood (ML) estimation technique which is known to have optimal features, is very problematic in situations when the distribution of X (marginal part) or error (conditional part) is nonnormal. Modified maximum...
Micro-level analysis of unregistered employment in Turkey with group comparisons
İner, Mehmet; Akkaya, Ayşen D.; Department of Statistics (2019)
Group comparison of logistic regression models in a similar way with OLS is manipulating depending on the unobserved heterogeneity in logistic regression. In this sense, this study focuses on the group comparison problem in logistic regression. In order to get to the root of the comparison problem in logistic regression, the theoretical background of the logistic regression is explained with the latent propensity interpretation in which the extent of the dependent variable’s closeness to success is taken in...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Ö. Tanju, “Cluster based model diagnostic for logistic regression,” M.S. - Master of Science, Middle East Technical University, 2016.