Modeling diseases with multiple disease characteristics: comparison of models and estimation methods

Erdem, Münire Tuğba
Epidemiological data with disease characteristic information can be modelled in several ways. One way is taking each disease characteristic as a response and constructing binary or polytomous logistic regression model. Second way is using a new response which consists of disease subtypes created by cross-classification of disease characteristic levels, and then constructing polytomous logistic regression model. The former may be disadvantageous since any possible covariation between disease characteristics is neglected, whereas the latter can capture that covariation behaviour. However, cross-classifying the characteristic levels increases the number of categories of response, so that dimensionality problem in parameter space may occur in classical polytomous logistic regression model. A two staged polytomous logistic regression model overcomes that dimensionality problem. In this thesis, study is progressen in two main directions: simulation study and data analysis parts. In simulation study, models that capture the covariation behaviour are compared in terms of the response model parameter estimators. That is, performances of the maximum likelihood estimation (MLE) approach to classical polytomous logistic regression, Bayesian estimation approach to classical polytomous logistic regression and pseudo-conditional likelihood (PCL) estimation approach to two stage polytomous logistic regression are compared in terms of bias and variation of estimators. Results of the simulation study revealed that for small sized sample and small number of disease subtypes, PCL outperforms in terms of bias and variance. For medium scaled size of total disease subtypes situation when sample size is small, PCL performs better than MLE, however when the sample size gets larger MLE has better performance in terms of standard errors of estimates. In addition, sampling variance of PCL estimators of two stage model converges to asymptotic variance faster than the ML estimators of classical polytomous logistic regression model. In data analysis, etiologic heterogeneity in breast cancer subtypes of Turkish female cancer patients is investigated, and the superiority of the two stage polytomous logistic regression model over the classical polytomous logistic model with disease subtypes is represented in terms of the interpretation of parameters and convenience in hypothesis testing.


Marginalized transition random effect models for multivariate longitudinal binary data
İlk Dağ, Özlem (Wiley, 2007-03-01)
Generalized linear models with random effects and/or serial dependence are commonly used to analyze longitudinal data. However, the computation and interpretation of marginal covariate effects can be difficult. This led Heagerty (1999, 2002) to propose models for longitudinal binary data in which a logistic regression is first used to explain the average marginal response. The model is then completed by introducing a conditional regression that allows for the longitudinal, within-subject, dependence, either...
Adaptive estimation and hypothesis testing methods
Dönmez, Ayça; Tiku, Moti Lal; Department of Statistics (2010)
For statistical estimation of population parameters, Fisher’s maximum likelihood estimators (MLEs) are commonly used. They are consistent, unbiased and efficient, at any rate for large n. In most situations, however, MLEs are elusive because of computational difficulties. To alleviate these difficulties, Tiku’s modified maximum likelihood estimators (MMLEs) are used. They are explicit functions of sample observations and easy to compute. They are asymptotically equivalent to MLEs and, for small n, are equal...
Estimation and hypothesis testing in multivariate linear regression models under non normality
İslam, Muhammed Qamarul (Informa UK Limited, 2017-01-01)
This paper discusses the problem of statistical inference in multivariate linear regression models when the errors involved are non normally distributed. We consider multivariate t-distribution, a fat-tailed distribution, for the errors as alternative to normal distribution. Such non normality is commonly observed in working with many data sets, e.g., financial data that are usually having excess kurtosis. This distribution has a number of applications in many other areas of research as well. We use modifie...
Robust estimation and hypothesis testing under short-tailedness and inliers
Akkaya, Ayşen (Springer Science and Business Media LLC, 2005-06-01)
Estimation and hypothesis testing based on normal samples censored in the middle are developed and shown to be remarkably efficient and robust to symmetric short-tailed distributions and to inliers in a sample. This negates the perception that sample mean and variance are the best robust estimators in such situations (Tiku, 1980; Dunnett, 1982).
Regression analysis with a dtochastic design variable
Sazak, HS; Tiku, ML; İslam, Muhammed Qamarul (Wiley, 2006-04-01)
In regression models, the design variable has primarily been treated as a nonstochastic variable. In numerous situations, however, the design variable is stochastic. The estimation and hypothesis testing problems in such situations are considered. Real life examples are given.
Citation Formats
M. T. Erdem, “Modeling diseases with multiple disease characteristics: comparison of models and estimation methods,” M.S. - Master of Science, Middle East Technical University, 2011.