On multivariate longitudinal binary data models and thiir applications in forecasting

Download
2012
Asar, Özgür
Longitudinal data arise when subjects are followed over time. This type of data is typically dependent, due to including repeated observations and this type of dependence is termed as within-subject dependence. Often the scientific interest is on multiple longitudinal measurements which introduce two additional types of associations, between-response and cross-response temporal dependencies. Only the statistical methods which take these association structures might yield reliable and valid statistical inferences. Although the methods for univariate longitudinal data have been mostly studied, multivariate longitudinal data still needs more work. In this thesis, although we mainly focus on multivariate longitudinal binary data models, we also consider other types of response families when necessary. We extend a work on multivariate marginal models, namely multivariate marginal models with response specific parameters (MMM1), and propose multivariate marginal models with shared regression parameters (MMM2). Both of these models are generalized estimating equation (GEE) based, and are valid for several response families such as Binomial, Gaussian, Poisson, and Gamma. Two different R packages, mmm and mmm2 are proposed to fit them, respectively. We further develop a marginalized multilevel model, namely probit normal marginalized transition random effects models (PNMTREM) for multivariate longitudinal binary response. By this model, implicit function theorem is introduced to explicitly link the levels of marginalized multilevel models with transition structures for the first time. An R package, bf pnmtrem is proposed to fit the model. PNMTREM is applied to data collected through Iowa Youth and Families Project (IYFP). Five different models, including univariate and multivariate ones, are considered to forecast multivariate longitudinal binary data. A comparative simulation study, which includes a model-independent data simulation process, is considered for this purpose. Forecasting independent variables are taken into account as well. To assess the forecasts, several accuracy measures, such as expected proportion of correct prediction (ePCP), area under the receiver operating characteristic (AUROC) curve, mean absolute scaled error (MASE) are considered. Mother's Stress and Children's Morbidity (MSCM) data are used to illustrate this comparison in real life. Results show that marginalized models yield better forecasting results compared to marginal models. Simulation results are in agreement with these results as well.

Suggestions

A simulation study on marginalized transition random effects models for multivariate longitudinal binary data
Yalçınöz, Zerrin; İlk Dağ, Özlem; Department of Statistics (2008)
In this thesis, a simulation study is held and a statistical model is fitted to the simulated data. This data is assumed to be the satisfaction of the customers who withdraw their salary from a particular bank. It is a longitudinal data which has bivariate and binary response. It is assumed to be collected from 200 individuals at four different time points. In such data sets, two types of dependence -the dependence within subject measurements and the dependence between responses- are important and these are...
Marginalized transition random effect models for multivariate longitudinal binary data
İlk Dağ, Özlem (Wiley, 2007-03-01)
Generalized linear models with random effects and/or serial dependence are commonly used to analyze longitudinal data. However, the computation and interpretation of marginal covariate effects can be difficult. This led Heagerty (1999, 2002) to propose models for longitudinal binary data in which a logistic regression is first used to explain the average marginal response. The model is then completed by introducing a conditional regression that allows for the longitudinal, within-subject, dependence, either...
Estimation and hypothesis testing in multivariate linear regression models under non normality
İslam, Muhammed Qamarul (Informa UK Limited, 2017-01-01)
This paper discusses the problem of statistical inference in multivariate linear regression models when the errors involved are non normally distributed. We consider multivariate t-distribution, a fat-tailed distribution, for the errors as alternative to normal distribution. Such non normality is commonly observed in working with many data sets, e.g., financial data that are usually having excess kurtosis. This distribution has a number of applications in many other areas of research as well. We use modifie...
Mutual information model selection algorithm for time series
Akca, Elif; Yozgatlıgil, Ceylan (Informa UK Limited, 2020-09-01)
Time series model selection has been widely studied in recent years. It is of importance to select the best model among candidate models proposed for a series in terms of explaining the procedure that governs the series and providing the most accurate forecast for the future observations. In this study, it is aimed to create an algorithm for order selection in Box-Jenkins models that combines penalized natural logarithm of mutual information among the original series and predictions coming from each candida...
Multiple frame sampling theory and applications
Dalçık, Aylin; Ayhan, Hüseyin Öztaş; Department of Statistics (2010)
One of the most important practical problems in conducting sample surveys is the list that can be used for selecting the sample is generally incomplete or out of date. Therefore, sample surveys can produce seriously biased estimates of the population parameters. On the other hand updating a list is a difficult and very expensive operation. Multiple-frame sampling refers to surveys where two or more frames are used and independent samples are taken respectively from each of the frames. It is assumed that the...
Citation Formats
Ö. Asar, “On multivariate longitudinal binary data models and thiir applications in forecasting,” M.S. - Master of Science, Middle East Technical University, 2012.