Predicting tennis match outcome: a machine learning approach using the SRP-CRISP-DM framework

Ünal, Toyan
Machine learning methods have demonstrated effectiveness in forecasting tennis match results. However, due to their empirical nature, decisions regarding the choice of specific datasets, models, feature sets, or hyperparameters significantly impact outcomes. In this thesis, we employed the Sports Result Prediction Cross-Industry Standard Process for Data Mining experimental framework to address this uncertainty. This approach ensures that results are both replicable and reproducible across diverse datasets and sports types. Our study encompasses 14 years of men’s singles tennis match data, from 2009 to 2022, with data from 2021 and 2022 designated as the hold-out test set. We applied six advanced feature extraction techniques, alongside three machine learning models and two feature selection methods. A 10-fold time-based cross-validation approach, coupled with hyperparameter tuning, was adopted. The Extreme Gradient Boosting model, after training and tuning, emerged as the most effective, achieving the lowest Brier score of 0.1913 and an accuracy of 70.5\% on the test set. The feature with the highest predictive power was identified as the average win ratios implied by the betting odds of the bookmakers, which played a pivotal role in forecasting match outcomes.
Citation Formats
T. Ünal, “Predicting tennis match outcome: a machine learning approach using the SRP-CRISP-DM framework,” M.S. - Master of Science, Middle East Technical University, 2023.