Bayesian semiparametric models for nonignorable missing datamechanisms in logistic regression

Öztürk, Olcay
In this thesis, Bayesian semiparametric models for the missing data mechanisms of nonignorably missing covariates in logistic regression are developed. In the missing data literature, fully parametric approach is used to model the nonignorable missing data mechanisms. In that approach, a probit or a logit link of the conditional probability of the covariate being missing is modeled as a linear combination of all variables including the missing covariate itself. However, nonignorably missing covariates may not be linearly related with the probit (or logit) of this conditional probability. In our study, the relationship between the probit of the probability of the covariate being missing and the missing covariate itself is modeled by using a penalized spline regression based semiparametric approach. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm to estimate the parameters is established. A WinBUGS code is constructed to sample from the full conditional posterior distributions of the parameters by using Gibbs sampling. Monte Carlo simulation experiments under different true missing data mechanisms are applied to compare the bias and efficiency properties of the resulting estimators with the ones from the fully parametric approach. These simulations show that estimators for logistic regression using semiparametric missing data models maintain better bias and efficiency properties than the ones using fully parametric missing data models when the true relationship between the missingness and the missing covariate has a nonlinear form. They are comparable when this relationship has a linear form.


Estimation and hypothesis testing in multivariate linear regression models under non normality
İslam, Muhammed Qamarul (Informa UK Limited, 2017-01-01)
This paper discusses the problem of statistical inference in multivariate linear regression models when the errors involved are non normally distributed. We consider multivariate t-distribution, a fat-tailed distribution, for the errors as alternative to normal distribution. Such non normality is commonly observed in working with many data sets, e.g., financial data that are usually having excess kurtosis. This distribution has a number of applications in many other areas of research as well. We use modifie...
A contribution to modern data reduction techniques and their applications by applied mathematics and statistical learning
Sakarya, Hatice; Weber, Gerhard Wilhelm; Öktem, Hakan; Department of Scientific Computing (2010)
Data Reduction Techniques, Locally Linear Embedding, Isomap, Principal Component Analysis.
Pairwise multiple comparisons under short-tailed symmetric distribution
Balcı, Sibel; Akkaya, Ayşen; Department of Statistics (2007)
In this thesis, pairwise multiple comparisons and multiple comparisons with a control are studied when the observations have short-tailed symmetric distributions. Under non-normality, the testing procedure is given and Huber estimators, trimmed mean with winsorized standard deviation, modified maximum likelihood estimators and ordinary sample mean and sample variance used in this procedure are reviewed. Finally, robustness properties of the stated estimators are compared with each other and it is shown that...
Multiple linear regression model with stochastic design variables
İslam, Muhammed Qamarul (Informa UK Limited, 2010-01-01)
In a simple multiple linear regression model, the design variables have traditionally been assumed to be non-stochastic. In numerous real-life situations, however, they are stochastic and non-normal. Estimators of parameters applicable to such situations are developed. It is shown that these estimators are efficient and robust. A real-life example is given.
Parallel computing in linear mixed models
Gökalp Yavuz, Fulya (Springer Science and Business Media LLC, 2020-09-01)
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed me...
Citation Formats
O. Öztürk, “Bayesian semiparametric models for nonignorable missing datamechanisms in logistic regression,” M.S. - Master of Science, Middle East Technical University, 2011.