Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
HCAB-SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification
Date
2020-04-01
Author
Al Majzoub, Hisham
Elgedawy, Islam
Akaydin, Oyku
Ulukok, Mehtap Kose
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
386
views
0
downloads
Cite This
Binary datasets are considered imbalanced when one of their two classes has less than 40% of the total number of the data instances (i.e., minority class). Existing classification algorithms are biased when applied on imbalanced binary datasets, as they misclassify instances of minority class. Many techniques are proposed to minimize the bias and to increase the classification accuracy. Synthetic Minority Oversampling Technique (SMOTE) is a well-known approach proposed to address this problem. It generates new synthetic data instances to balance the dataset. Unfortunately, it generates these instances randomly, leading to the generation of useless new instances, which is time and memory consuming. Different SMOTE derivatives were proposed to overcome this problem (such as Borderline SMOTE), yet the number of generated instances slightly changed. To overcome such problem, this paper proposes a novel approach for generating synthesized data instances known as Hybrid Clustered Affinitive Borderline SMOTE (HCAB-SMOTE). It managed to minimize the number of generated instances while increasing the classification accuracy. It combines undersampling for removing majority noise instances and oversampling approaches to enhance the density of the borderline. It uses k-means clustering on the borderline area and identify which clusters to oversample to achieve better results. Experimental results show that HCAB-SMOTE outperformed SMOTE, Borderline SMOTE, AB-SMOTE and CAB-SMOTE approaches which were developed before reaching HCAB-SMOTE, as it provided the highest classification accuracy with the least number of generated instances.
Subject Keywords
Imbalanced data
,
Borderline SMOTE
,
Oversampling
,
SMOTE
,
k-means clustering
,
AB-SMOTE
URI
https://hdl.handle.net/11511/67770
Journal
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING
DOI
https://doi.org/10.1007/s13369-019-04336-1
Collections
Engineering, Article
Suggestions
OpenMETU
Core
Adaptive Oversampling for Imbalanced Data Classification
Ertekin Bolelli, Şeyda (2013-01-01)
Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preproce...
A Methodology to Implement Box-Cox Transformation When No Covariate is Available
Dag, Osman; Asar, Ozgur; İlk Dağ, Özlem (2014-01-01)
Box-Cox transformation is one of the most commonly used methodologies when data do not follow normal distribution. However, its use is restricted since it usually requires the availability of covariates. In this article, the use of a non-informative auxiliary variable is proposed for the implementation of Box-Cox transformation. Simulation studies are conducted to illustrate that the proposed approach is successful in attaining normality under different sample sizes and most of the distributions and in esti...
Kernel probabilistic distance clustering algorithms
Özkan, Dilay; İyigün, Cem; Department of Industrial Engineering (2022-7)
Clustering is an unsupervised learning method that groups data considering the similarities between objects (data points). Probabilistic Distance Clustering (PDC) is a soft clustering approach based on some principles. Instead of directly assigning an object to a cluster, it assigns them to clusters with a membership probability. PDC is a simple yet effective clustering algorithm that performs well on spherical-shaped and linearly separable data sets. Traditional clustering algorithms fail when the data ...
Usage of Tinker Plots to Address and Remediate 6th Grade Students' Misconceptions about Mean and Median
Yilmaz, Zuhal (2013-07-01)
Current need for interpreting data, making inferences from existing data, leads to an increased emphasis on the teaching of statistics in mathematics curricula. Recent studies suggested that using educational technology supports students' meaningful understanding of statistics. This study addresses the impor tance of technological tool usage to introduce introductory statistical concepts; mean and median and diagnose student's misconceptions about these concepts. Three teaching experiment sessions were cond...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Al Majzoub, I. Elgedawy, O. Akaydin, and M. K. Ulukok, “HCAB-SMOTE: A Hybrid Clustered Affinitive Borderline SMOTE Approach for Imbalanced Data Binary Classification,”
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING
, pp. 3205–3222, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/67770.