Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
HCAB-SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification
Date
2020-04-01
Author
Al Majzoub, Hisham
Elgedawy, Islam
Akaydin, Oyku
Ulukok, Mehtap Kose
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
109
views
0
downloads
Cite This
Binary datasets are considered imbalanced when one of their two classes has less than 40% of the total number of the data instances (i.e., minority class). Existing classification algorithms are biased when applied on imbalanced binary datasets, as they misclassify instances of minority class. Many techniques are proposed to minimize the bias and to increase the classification accuracy. Synthetic Minority Oversampling Technique (SMOTE) is a well-known approach proposed to address this problem. It generates new synthetic data instances to balance the dataset. Unfortunately, it generates these instances randomly, leading to the generation of useless new instances, which is time and memory consuming. Different SMOTE derivatives were proposed to overcome this problem (such as Borderline SMOTE), yet the number of generated instances slightly changed. To overcome such problem, this paper proposes a novel approach for generating synthesized data instances known as Hybrid Clustered Affinitive Borderline SMOTE (HCAB-SMOTE). It managed to minimize the number of generated instances while increasing the classification accuracy. It combines undersampling for removing majority noise instances and oversampling approaches to enhance the density of the borderline. It uses k-means clustering on the borderline area and identify which clusters to oversample to achieve better results. Experimental results show that HCAB-SMOTE outperformed SMOTE, Borderline SMOTE, AB-SMOTE and CAB-SMOTE approaches which were developed before reaching HCAB-SMOTE, as it provided the highest classification accuracy with the least number of generated instances.
Subject Keywords
Imbalanced data
,
Borderline SMOTE
,
Oversampling
,
SMOTE
,
k-means clustering
,
AB-SMOTE
URI
https://hdl.handle.net/11511/67770
Journal
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING
DOI
https://doi.org/10.1007/s13369-019-04336-1
Collections
Engineering, Article
Suggestions
OpenMETU
Core
Adaptive Oversampling for Imbalanced Data Classification
Ertekin Bolelli, Şeyda (2013-01-01)
Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preproce...
A Methodology to Implement Box-Cox Transformation When No Covariate is Available
Dag, Osman; Asar, Ozgur; İlk Dağ, Özlem (2014-01-01)
Box-Cox transformation is one of the most commonly used methodologies when data do not follow normal distribution. However, its use is restricted since it usually requires the availability of covariates. In this article, the use of a non-informative auxiliary variable is proposed for the implementation of Box-Cox transformation. Simulation studies are conducted to illustrate that the proposed approach is successful in attaining normality under different sample sizes and most of the distributions and in esti...
Adapting a Robust Model into Hybrid Implementations of Machine Learning Algorithms and Statistical Methods for Longitudinal Data
Erduran, İbrahim Hakkı; Gökalp Yavuz, Fulya; Ebegil, Meral; Department of Statistics (2021-9)
Data structures in which the same characteristics are measured repeatedly at different time points are counted among the longitudinal data types. These datasets require the use of advanced modeling techniques because of the dependency structure amongst replicates. Linear mixed models (LMM) is an advanced regression method used in the analysis of such data sets. Although the LMM method provides many flexibility and advantages, the model setup is based on a number of assumptions that are challenging to provid...
Improving the scalability of ILP-based multi-relational concept discovery system through parallelization
Mutlu, Ayşe Ceyda; Karagöz, Pınar; Kavurucu, Yusuf (2012-03-01)
Due to the increase in the amount of relational data that is being collected and the limitations of propositional problem definition in relational domains, multi-relational data mining has arisen to be able to extract patterns from relational data. In order to cope with intractably large search space and still to be able to generate high-quality patterns. ILP-based multi-relational data mining and concept discovery systems employ several search strategies and pattern limitations. Another direction to cope w...
Independently weighted value difference metric
Ortakaya, Ahmet Fatih (2017-10-01)
The majority of the difference metrics used in categorical classification algorithms do not take the dependence structure among attributes into account. Some of these metrics even make strong assumptions on attribute independence which are not realistic for many real-world datasets. In addition, these metrics do not consider attribute importance on the class variable. In this paper, a new difference metric is proposed which is named as Independently Weighted Value Difference Metric (IWVDM). IWVDM includes a...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Al Majzoub, I. Elgedawy, O. Akaydin, and M. K. Ulukok, “HCAB-SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification,”
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING
, pp. 3205–3222, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/67770.