Imbalanced learning techniques: Experiments on NCAA College Basketball League player statistics dataset

2022-9-02
Güler, Emir
This study was conducted with the purpose of finding an answer to the question: “What are the state of art methods for imbalanced classification and which combinations of these methods yield best results in extremely imbalanced real-world data?” In order to accomplish the purpose, internal (algorithm-based) and external (sampling-based) imbalanced learning techniques were applied individually and in combination. The dataset used for the imbalanced classification task is National Collegiate Athletic Association (NCAA) Men’s Basketball League Player Statistics Data. The players draft status (whether the player drafted by any NBA Teams or not) was used as the target variable for binary classification. Minority : Majority ratio of the target variable is 3.39 : 96.61. F1 score was used as the main evaluation metric. It was found in the experiments that default parameters of sampling techniques do not work well with extreme imbalance. Optimum minority over majority ratio hyperparameters ranged between 0.07 to 0.11 which differs from the general advice and application where minority and majority class frequencies are matched which makes the ratio hyperparameter equal to 1. On the other hand, Cost-sensitive methods were combined with sampling methods and class weight hyperparameters of cost-sensitive learning model which works optimally found as {class0: 1, class1: 1}, {class0: 2, class1: 1} or {class0: 3, class1: 2} contrary to the general teaching of “if class ratio is 1:9, the cost-sensitive weight hyperparameter should be the inverse of the original ratio”. Lastly, probability threshold moving was applied to maximize F1 score. That way, 3 different methods in Imbalanced Learning were consolidated and better results were acquired compared to the single use of the state of art methods. Additionally, Monte Carlo simulation was applied to fortify and generalize the results obtained by real-world dataset.

Suggestions

Conjoint Use of Regression Analysis and Functional Measurement to Test Models of Combination of Factors Predicting Negative Attitude to Women
Bugay, Asli; Delevi, Rakel; Mullet, Etienne (Editorial Pontificia Universidad Javeriana, 2019-01-01)
The present study was aimed at showing that by conjointly using two techniques that are rarely used in combination - regression analysis and functional measurement, it may be possible to rigorously tests models of combination of factors using data obtained in traditional multi-item/multi-scale surveys. The data used for this demonstration were taken from a large survey (N = 3,235) of Turkish students' attitude to women (ATW). As it included 12 types of predictors (e.g., age, geographic location, score on co...
Adaptive evolution strategies in structural optimization: Enhancing their computational performance with applications to large-scale structures
Hasançebi, Oğuzhan (2008-01-01)
In this study the computational performance of adaptive evolution strategies (ESs) in large-scale structural optimization is mainly investigated to achieve the following objectives: (i) to present an ESs based solution algorithm for efficient optimum design of large structural systems consisting of continuous, discrete and mixed design variables; (ii) to integrate new parameters and methodologies into adaptive ESs to improve the computational performance of the algorithm; and (iii) to assess successful self...
Bridging Brain and Educational Sciences: An Optical Brain Imaging Study of Visuospatial Reasoning
Çakır, Murat Perit; Izzetoglu, Meltem; Shewokis, Patricia A.; Izzetoglu, Kurtulus; Onaral, Banu (2011-10-22)
In this paper we present an experimental study where we investigated neural correlates of visuospatial reasoning during math problem solving in a computer-based environment to exemplify the potential for conducting interdisciplinary research that incorporates insights from educational research and cognitive neuroscience. Functional near-infrared spectroscopy (fNIRS) technology is used to measure changes in blood oxygenation in the dorsolateral and inferior prefrontal cortex while subjects attempt to solve t...
Efficient preconditioning strategies for the multilevel fast multipole algorithm
Gurel, Levent; Malas, Tahir; Ergül, Özgür Salih (2007-03-30)
For the iterative solutions of the integral equation methods employing the multilevel fast multipole algorithm (MLFMA), effective preconditioning techniques should be developed for robustness and efficiency. Preconditioning techniques for such problems can be broadly classified as fixed preconditioners that are generated from the sparse near-field matrix and variable ones that can make use of MLFMA with the help of the flexible solvers. Among fixed preconditioners, we show that an incomplete LU precondition...
An experimental comparison of symbolic and neural learning algorithms
Baykal, Nazife (1998-04-23)
In this paper comparative strengths and weaknesses of symbolic and neural learning algorithms are analysed. Experiments comparing the new generation symbolic algorithms and neural network algorithms have been performed using twelve large, real-world data sets.
Citation Formats
E. Güler, “Imbalanced learning techniques: Experiments on NCAA College Basketball League player statistics dataset,” M.S. - Master of Science, Middle East Technical University, 2022.