Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data
Download
thesis_IHK.pdf
Date
2022-9-1
Author
Karaman, İsmail Hakkı
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
362
views
388
downloads
Cite This
In the real world, while the amount of data increases, it is not easy to find labeled data for Machine Learning projects, because of the compelling cost and effort requirements for labeling data. Also, most Machine Learning projects, especially multi-label classification problems, struggle with the data imbalance problem. In these problems, some classes, even, do not have enough data to train a classifier. In this study, an over sampling method for multi-label text classification problems is developed and stud ied to solve performance problems arising from the data imbalance. The proposed method finds new samples from unlabeled data by utilizing similarities between instances. It finds similar instances for a class from the unlabeled set and checks for improvement to see the effect on the performance of these instances. The unlabeled set is searched iteratively and the instances that assist the performance improvement are added to the labeled set. The experiments show that our method works well and the performance of the classifier is improved after oversampling.
Subject Keywords
Oversampling
,
Text classification
,
Multi-label classification
,
Imbalanced classification
,
Text similarity
URI
https://hdl.handle.net/11511/99558
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
A clustering method for web data with multi-type interrelated components
Bolelli, Levent; Ertekin Bolelli, Şeyda; Zhou, Ding; Giles, C Lee (2007-05-08)
Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on author...
A Hybrid Approach for Process Mining Using From to Chart Arranged by Genetic Algorithms LNCS San Sebastian Spain June 2010
Esgin, Eren; Karagöz, Pınar (2010-06-18)
In the scope of this study, a hybrid data analysis methodology to business process modeling is proposed in such a way that; From-to Chart, which is basically used as the front-end to figure out the observed patterns among the activities at realistic event logs, is rearranged by Genetic Algorithms to convert these derived raw relations into activity sequence. According to experimental results, acceptably good (sub-optimal or optimal) solutions are obtained for relatively complex business processes at a reaso...
A new framework of multi-objective evolutionary algorithms for feature selection and multi-label classification of video data
Karagoz, Gizem Nur; Yazıcı, Adnan; Dokeroglu, Tansel; Coşar, Ahmet (2020-06-01)
There are few studies in the literature to address the multi-objective multi-label feature selection for the classification of video data using evolutionary algorithms. Selecting the most appropriate subset of features is a significant problem while maintaining/improving the accuracy of the prediction results. This study proposes a framework of parallel multi-objective Non-dominated Sorting Genetic Algorithms (NSGA-II) for exploring a Pareto set of non-dominated solutions. The subsets of non-dominated featu...
A deep learning approach for the transonic flow field predictions around airfoils
Duru, Cihat; Alemdar, Hande; Baran, Özgür Uğraş (2022-01-01)
Learning from data offers new opportunities for developing computational methods in research fields, such as fluid dynamics, which constantly accumulate a large amount of data. This study presents a deep learning approach for the transonic flow field predictions around airfoils. The physics of transonic flow is integrated into the neural network model by utilizing Reynolds-averaged Navier–Stokes (RANS) simulations. A detailed investigation on the performance of the model is made both qualitatively and quant...
A Hybrid Computational Method based on Convex Optimizationfor Outlier Problems
Yerlikaya Ozkurt, Fatma; Askan Gündoğan, Ayşegül; Weber, Gerhard Wiehelm (2015-11-01)
Statistical modeling plays a central role for any prediction problem of interest.However, predictive models may give misleading results when the data containoutliers. In many applications, it is important to identify and treat the outlierswithout direct elimination. To handle such issues, a hybrid computational methodbased on conic quadratic programming is introduced and employed onearthquake ground motion data set. Results are compared against widely-usedground motion prediction models.
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
İ. H. Karaman, “A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data,” M.S. - Master of Science, Middle East Technical University, 2022.