Imbalanced learning techniques: Experiments on NCAA College Basketball League player statistics dataset

2022-9-02
Güler, Emir
This study was conducted with the purpose of finding an answer to the question: “What are the state of art methods for imbalanced classification and which combinations of these methods yield best results in extremely imbalanced real-world data?” In order to accomplish the purpose, internal (algorithm-based) and external (sampling-based) imbalanced learning techniques were applied individually and in combination. The dataset used for the imbalanced classification task is National Collegiate Athletic Association (NCAA) Men’s Basketball League Player Statistics Data. The players draft status (whether the player drafted by any NBA Teams or not) was used as the target variable for binary classification. Minority : Majority ratio of the target variable is 3.39 : 96.61. F1 score was used as the main evaluation metric. It was found in the experiments that default parameters of sampling techniques do not work well with extreme imbalance. Optimum minority over majority ratio hyperparameters ranged between 0.07 to 0.11 which differs from the general advice and application where minority and majority class frequencies are matched which makes the ratio hyperparameter equal to 1. On the other hand, Cost-sensitive methods were combined with sampling methods and class weight hyperparameters of cost-sensitive learning model which works optimally found as {class0: 1, class1: 1}, {class0: 2, class1: 1} or {class0: 3, class1: 2} contrary to the general teaching of “if class ratio is 1:9, the cost-sensitive weight hyperparameter should be the inverse of the original ratio”. Lastly, probability threshold moving was applied to maximize F1 score. That way, 3 different methods in Imbalanced Learning were consolidated and better results were acquired compared to the single use of the state of art methods. Additionally, Monte Carlo simulation was applied to fortify and generalize the results obtained by real-world dataset.

Suggestions

Conjoint Use of Regression Analysis and Functional Measurement to Test Models of Combination of Factors Predicting Negative Attitude to Women
Bugay, Asli; Delevi, Rakel; Mullet, Etienne (Editorial Pontificia Universidad Javeriana, 2019-01-01)
The present study was aimed at showing that by conjointly using two techniques that are rarely used in combination - regression analysis and functional measurement, it may be possible to rigorously tests models of combination of factors using data obtained in traditional multi-item/multi-scale surveys. The data used for this demonstration were taken from a large survey (N = 3,235) of Turkish students' attitude to women (ATW). As it included 12 types of predictors (e.g., age, geographic location, score on co...
Asymmetric Confidence Interval with Box-Cox Transformation in R
Dağ, Osman; İlk Dağ, Özlem (null; 2017-12-08)
Normal distribution is important in statistical literature since most of the statistical methods are based on normal distribution such as t-test, analysis of variance and regression analysis. However, it is difficult to satisfy the normality assumption for real life datasets. Box–Cox power transformation is the most well-known and commonly utilized remedy [2]. The algorithm relies on a single transformation parameter. In the original article [2], maximum likelihood estimation was proposed for the estimation...
Adaptive evolution strategies in structural optimization: Enhancing their computational performance with applications to large-scale structures
Hasançebi, Oğuzhan (2008-01-01)
In this study the computational performance of adaptive evolution strategies (ESs) in large-scale structural optimization is mainly investigated to achieve the following objectives: (i) to present an ESs based solution algorithm for efficient optimum design of large structural systems consisting of continuous, discrete and mixed design variables; (ii) to integrate new parameters and methodologies into adaptive ESs to improve the computational performance of the algorithm; and (iii) to assess successful self...
Bridging Brain and Educational Sciences: An Optical Brain Imaging Study of Visuospatial Reasoning
Çakır, Murat Perit; Izzetoglu, Meltem; Shewokis, Patricia A.; Izzetoglu, Kurtulus; Onaral, Banu (2011-10-22)
In this paper we present an experimental study where we investigated neural correlates of visuospatial reasoning during math problem solving in a computer-based environment to exemplify the potential for conducting interdisciplinary research that incorporates insights from educational research and cognitive neuroscience. Functional near-infrared spectroscopy (fNIRS) technology is used to measure changes in blood oxygenation in the dorsolateral and inferior prefrontal cortex while subjects attempt to solve t...
On Equivalence Relationships Between Classification and Ranking Algorithms
Ertekin Bolelli, Şeyda (2011-10-01)
We demonstrate that there are machine learning algorithms that can achieve success for two separate tasks simultaneously, namely the tasks of classification and bipartite ranking. This means that advantages gained from solving one task can be carried over to the other task, such as the ability to obtain conditional density estimates, and an order-of-magnitude reduction in computational time for training the algorithm. It also means that some algorithms are robust to the choice of evaluation metric used; the...
Citation Formats
E. Güler, “Imbalanced learning techniques: Experiments on NCAA College Basketball League player statistics dataset,” M.S. - Master of Science, Middle East Technical University, 2022.