Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids

2024-04-01
Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18th century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.
Astronomy and Computing
Citation Formats
İ. Tanrıverdi, Ö. İlk Dağ, and M. A. Gürkan, “Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids,” Astronomy and Computing, vol. 47, pp. 0–0, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85187211509&origin=inward.