Imbalanced learning techniques: Experiments on NCAA College Basketball League player statistics dataset

Güler, Emir
This study was conducted with the purpose of finding an answer to the question: “What are the state of art methods for imbalanced classification and which combinations of these methods yield best results in extremely imbalanced real-world data?” In order to accomplish the purpose, internal (algorithm-based) and external (sampling-based) imbalanced learning techniques were applied individually and in combination. The dataset used for the imbalanced classification task is National Collegiate Athletic Association (NCAA) Men’s Basketball League Player Statistics Data. The players draft status (whether the player drafted by any NBA Teams or not) was used as the target variable for binary classification. Minority : Majority ratio of the target variable is 3.39 : 96.61. F1 score was used as the main evaluation metric. It was found in the experiments that default parameters of sampling techniques do not work well with extreme imbalance. Optimum minority over majority ratio hyperparameters ranged between 0.07 to 0.11 which differs from the general advice and application where minority and majority class frequencies are matched which makes the ratio hyperparameter equal to 1. On the other hand, Cost-sensitive methods were combined with sampling methods and class weight hyperparameters of cost-sensitive learning model which works optimally found as {class0: 1, class1: 1}, {class0: 2, class1: 1} or {class0: 3, class1: 2} contrary to the general teaching of “if class ratio is 1:9, the cost-sensitive weight hyperparameter should be the inverse of the original ratio”. Lastly, probability threshold moving was applied to maximize F1 score. That way, 3 different methods in Imbalanced Learning were consolidated and better results were acquired compared to the single use of the state of art methods. Additionally, Monte Carlo simulation was applied to fortify and generalize the results obtained by real-world dataset.


