Hybrid statistical and machine learning modeling of cognitive neuroscience data

The nested data structure is prevalent for cognitive measure experiments due to repeatedly taken observations from different brain locations within subjects. The analysis methods used for this data type should consider the dependency structure among the repeated measurements. However, the dependency assumption is mainly ignored in the cognitive neuroscience data analysis literature. We consider both statistical, and machine learning methods extended to repeated data analysis and compare distinct algorithms in terms of their advantage and disadvantages. Unlike basic algorithm comparison studies, this article analyzes novel neuroscience data considering the dependency structure for the first time with several statistical and machine learning methods and their hybrid forms. In addition, the fitting performances of different algorithms are compared using contaminated data sets, and the cross-validation approach. One of our findings suggests that the GLMM tree, including random term indices indicating the location of functional near-infrared spectroscopy optodes nested within experimental units, shows the best predictive performance with the lowest MSE, RMSE, and MAE model performance metrics. However, there is a trade-off between accuracy and speed since this algorithm is required the highest computational time.
Journal of Applied Statistics


