Adaptive Oversampling for Imbalanced Data Classification

2013-01-01
Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preprocessing of the data, Virtual generates synthetic examples for the minority class during the training process, therefore it removes the need for an extra preprocessing stage. In the context of learning with Support Vector Machines, we demonstrate that Virtual outperforms competitive oversampling techniques both in terms of generalization performance and computational complexity.

Suggestions

Learning on the border: Active learning in imbalanced data classification
Ertekin Bolelli, Şeyda; Bottou, Leon; Giles, C Lee (2007-10-06)
This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate th...
Adaptive neuro fuzzy inference system applications in chemical processes
Güner, Evren; Özgen, Canan; Leblebicioğlu, Kemal; Department of Chemical Engineering (2003)
Neuro-Fuzzy systems are the systems that neural networks (NN) are incorporated in fuzzy systems, which can use knowledge automatically by learning algorithms of NNs. They can be viewed as a mixture of local experts. Adaptive Neuro-Fuzzy inference system (ANFIS) is one of the examples of Neuro Fuzzy systems in which a fuzzy system is implemented in the framework of adaptive networks. ANFIS constructs an input-output mapping based both on human knowledge (in the form of fuzzy rules) and on generated input-out...
Self-training for unsupervised domain adaptation
Akkaya, İbrahim Batuhan; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2022-8-31)
Despite the outstanding performance of deep learning techniques, achieving high per- formance generally demands large amounts of labeled data. Because of the labeling costs, people consider utilizing public datasets or synthetic images with freely gen- erated labels. Unfortunately, deep neural networks are notably sensitive to domain misalignment. The methods to reduce domain misalignment are studied under do- main adaptation (DA). Self-training, which selects a subset of the unlabeled data for pseudo-label...
Adaptive evolution strategies in structural optimization: Enhancing their computational performance with applications to large-scale structures
Hasançebi, Oğuzhan (2008-01-01)
In this study the computational performance of adaptive evolution strategies (ESs) in large-scale structural optimization is mainly investigated to achieve the following objectives: (i) to present an ESs based solution algorithm for efficient optimum design of large structural systems consisting of continuous, discrete and mixed design variables; (ii) to integrate new parameters and methodologies into adaptive ESs to improve the computational performance of the algorithm; and (iii) to assess successful self...
Improvement of Hyperspectral Classification Accuracy with Limited Training Data Using Meanshift Segmentation
Özdemir, Okan Bilge; Çetin, Yasemin (2014-04-25)
In this study, the performance of hyperspectral classification algorithms with limited training data investigated. Support Vector Machines (SVM) with Gaussian kernel is used. Principle Component Analysis (PCA) is employed for preprocessing and meanshift segmentation is used to incorporate spatial information with spectral information to observe the effect spatial information. Pattern search algorithm is used to optimize meanshift segmentation parameters. The performance of the algorithm is demonstrated on h...
Citation Formats
Ş. Ertekin Bolelli, “Adaptive Oversampling for Imbalanced Data Classification,” 2013, vol. 264, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/44178.