Adapting a Robust Model into Hybrid Implementations of Machine Learning Algorithms and Statistical Methods for Longitudinal Data

2021-9
Erduran, İbrahim Hakkı
Data structures in which the same characteristics are measured repeatedly at different time points are counted among the longitudinal data types. These datasets require the use of advanced modeling techniques because of the dependency structure amongst replicates. Linear mixed models (LMM) is an advanced regression method used in the analysis of such data sets. Although the LMM method provides many flexibility and advantages, the model setup is based on a number of assumptions that are challenging to provide in real data sets. Another method for analyzing the longitudinal data could be machine learning (ML) algorithms. However, many of them desire data to be independent and identically distributed (iid) which is not applicable for longitudinal data. Because of these limitations, hybrid methods including both LMM and ML have been developed to make precise estimations for longitudinal data in models with both random and fixed effects. However, these methods have model setups based on the assumption of a normal distribution of errors, which are not robust to the presence of heavy-tailed distributed data and outlier observations. This study aims to extend and robustfy hybrid methods including LMM and ML by introducing a heavy-tailed distribution into the model setting. While LMM performs parameter estimations related to the random effect with a robust approach; the ML algorithm performs the estimation of the fixed effect parameters with the proposed model. The model is tested on two real data sets and simulation studies with several conditions and it gives promising results in real datasets and especially in simulation trials involving heavy-tailed situations and outliers. Almost all of the results based on comparison criteria such as RMSE, AIC and BIC favor the proposed method. While this study expands one of the modern topics of statistics with a robust approach and a machine learning method; it will guide researchers who practice in this field with the open source and codes provided.

Suggestions

Algorithm Overview and Design for Mixed Effects Models
Koca, Burcu; Gökalp Yavuz, Fulya (2021-06-06)
Linear Mixed Model (LMM) is an extended regression method that is used for longitudinal data which has repeated measures within the individual. It is natural to expect high correlation between these repeats over a period of time for the same individual. Since classical approaches may fail to cover these correlations, LMM handles this significant concern by introducing random effect terms in the model. Besides its flexible structure in terms of modeling, LMM has several application areas such as clinical tri...
Implementation of different algorithms in linear mixed models: case studies with TIMSS
Koca, Burcu; Gökalp Yavuz, Fulya; Department of Statistics (2021-9-06)
Mixed models are frequently used in longitudinal data types with time repetition over the same subject and clustered data types formed by observations gathered around certain groups. The modeling technique which models the dependency structure between repetitions and observations in the same cluster is required to use algorithms for parameter estimations. The same model can be solved with various algorithms arising from setup, inference and approach differences. In this study, several algorithms used for LM...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
A Bayesian Approach to Learning Scoring Systems
Ertekin Bolelli, Şeyda (2015-12-01)
We present a Bayesian method for building scoring systems, which are linear models with coefficients that have very few significant digits. Usually the construction of scoring systems involve manual efforthumans invent the full scoring system without using data, or they choose how logistic regression coefficients should be scaled and rounded to produce a scoring system. These kinds of heuristics lead to suboptimal solutions. Our approach is different in that humans need only specify the prior over what the ...
On numerical optimization theory of infinite kernel learning
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2010-10-01)
In Machine Learning algorithms, one of the crucial issues is the representation of the data. As the given data source become heterogeneous and the data are large-scale, multiple kernel methods help to classify "nonlinear data". Nevertheless, the finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, a novel method of "infinite" kernel combinations is proposed with the help of infinite and semi-infinite programming regarding all elements in kernel space. Look...
Citation Formats
İ. H. Erduran, “Adapting a Robust Model into Hybrid Implementations of Machine Learning Algorithms and Statistical Methods for Longitudinal Data,” M.S. - Master of Science, Middle East Technical University, 2021.