Adapting a Robust Model into Hybrid Implementations of Machine Learning Algorithms and Statistical Methods for Longitudinal Data

Erduran, İbrahim Hakkı
Data structures in which the same characteristics are measured repeatedly at different time points are counted among the longitudinal data types. These datasets require the use of advanced modeling techniques because of the dependency structure amongst replicates. Linear mixed models (LMM) is an advanced regression method used in the analysis of such data sets. Although the LMM method provides many flexibility and advantages, the model setup is based on a number of assumptions that are challenging to provide in real data sets. Another method for analyzing the longitudinal data could be machine learning (ML) algorithms. However, many of them desire data to be independent and identically distributed (iid) which is not applicable for longitudinal data. Because of these limitations, hybrid methods including both LMM and ML have been developed to make precise estimations for longitudinal data in models with both random and fixed effects. However, these methods have model setups based on the assumption of a normal distribution of errors, which are not robust to the presence of heavy-tailed distributed data and outlier observations. This study aims to extend and robustfy hybrid methods including LMM and ML by introducing a heavy-tailed distribution into the model setting. While LMM performs parameter estimations related to the random effect with a robust approach; the ML algorithm performs the estimation of the fixed effect parameters with the proposed model. The model is tested on two real data sets and simulation studies with several conditions and it gives promising results in real datasets and especially in simulation trials involving heavy-tailed situations and outliers. Almost all of the results based on comparison criteria such as RMSE, AIC and BIC favor the proposed method. While this study expands one of the modern topics of statistics with a robust approach and a machine learning method; it will guide researchers who practice in this field with the open source and codes provided.


Algorithm Overview and Design for Mixed Effects Models
Koca, Burcu; Gökalp Yavuz, Fulya (2021-06-06)
Linear Mixed Model (LMM) is an extended regression method that is used for longitudinal data which has repeated measures within the individual. It is natural to expect high correlation between these repeats over a period of time for the same individual. Since classical approaches may fail to cover these correlations, LMM handles this significant concern by introducing random effect terms in the model. Besides its flexible structure in terms of modeling, LMM has several application areas such as clinical tri...
Implementation of different algorithms in linear mixed models: case studies with TIMSS
Koca, Burcu; Gökalp Yavuz, Fulya; Department of Statistics (2021-9-06)
Mixed models are frequently used in longitudinal data types with time repetition over the same subject and clustered data types formed by observations gathered around certain groups. The modeling technique which models the dependency structure between repetitions and observations in the same cluster is required to use algorithms for parameter estimations. The same model can be solved with various algorithms arising from setup, inference and approach differences. In this study, several algorithms used for LM...
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Recent Trends in the Use of Graph Neural Network Models for Natural Language Processing
Yılmaz, Burcu; Genç, Hilal; Ağrıman, Mustafa; Demirdöver, Buğra Kaan; Erdemir, Mert; Şimşek, Gökhan; Karagöz, Pınar (IGI Global, 2020-01-01)
Graphs are powerful data structures that allow us to represent varying relationships within data. In the past, due to the difficulties related to the time complexities of processing graph models, graphs rarely involved machine learning tasks. In recent years, especially with the new advances in deep learning techniques, increasing number of graph models related to the feature engineering and machine learning are proposed. Recently, there has been an increase in approaches that automatically learn to encode ...
Hybrid statistical and machine learning modeling of cognitive neuroscience data
Çakar, Serenay; Gökalp Yavuz, Fulya (2023-01-01)
The nested data structure is prevalent for cognitive measure experiments due to repeatedly taken observations from different brain locations within subjects. The analysis methods used for this data type should consider the dependency structure among the repeated measurements. However, the dependency assumption is mainly ignored in the cognitive neuroscience data analysis literature. We consider both statistical, and machine learning methods extended to repeated data analysis and compare distinct algorithms ...
Citation Formats
İ. H. Erduran, “Adapting a Robust Model into Hybrid Implementations of Machine Learning Algorithms and Statistical Methods for Longitudinal Data,” M.S. - Master of Science, Middle East Technical University, 2021.