IRREGULAR LONGITUDINAL DATA ANALYSIS WITH STATISTICAL AND MACHINE LEARNING METHODS IN ASTEROID DATASET

2023-9-11
Tanrıverdi, İrem
During the 18-th century, scientific research on asteroids began to gain recognition and importance. Records are kept of the characteristics of asteroids that entered Earth's orbit, and their hazardous status is classified. It is crucial to use appropriate analysis methods and account for the longitudinal structure of the data. Unfortunately, previous studies used methods that ignore data dependency in Near-Earth Asteroids (NEA) data. Therefore, this thesis proposes various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the National Aeronautics and Space Administration (NASA). We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. According to the findings, the accuracies of the models range from 0.89 and 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest performance. Then, NEA data is simulated with different subject sizes and regular time points. As a result, the model performances increase as the subject and time sizes increase. The model with the highest performance is GPBoost, while the model with the poorest performance is GLMM-Tree for small sample sizes.
Citation Formats
İ. Tanrıverdi, “IRREGULAR LONGITUDINAL DATA ANALYSIS WITH STATISTICAL AND MACHINE LEARNING METHODS IN ASTEROID DATASET,” M.S. - Master of Science, Middle East Technical University, 2023.