Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Cost-sensitive learning for rare subtype classfication of lung cancer
Download
HIBIT22_paper_111.pdf
Date
2022-10
Author
Kızılilsoley, Nehir
Tanıl, Ezgi
Nikerel, Emrah
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
89
views
31
downloads
Cite This
Machine learning (ML) algorithms assume or promote that the training set is balanced among classes. For imbalanced datasets, even though the overall accuracy is high, the classical machine learning algorithms bias toward the majority class, causing the model fit poorly to the minority class [1,2] which hinders the use of these algorithms for classification of rare events. Strategies to overcome this problem including altering the training data directly to reduce the difference between classes or changing the learning procedure so that the algorithm takes also the minority class into account are proposed [2]. Usually, imbalance problem is handled with oversampling the minority or undersampling the majority class and/or generating synthetic samples from the original training data. Gene expression data is highly valuable and popular data for cancer classification by ML. However, it is highdimensional and severely imbalanced, making gene expression classification a cost-sensitive problem [1]. Cost-sensitive learning (CSL), uses imbalanced costs for classes while making predictions and is required when prediction of minority class is more “interesting” than the other class(es). Instead of maximizing the overall accuracy on all classes while assuming equal costs, the goal is to minimize cost (penalty of a misclassification) as classes are associated with different penalties for misclassification. In this work, subtypes of lung cancer (AD, SC, LaC and SCLC) are classified using different CSL models that are either classical (e.g., support vector machines, naïve bayes, random forest) or ensemble learners, using imbalanced RNA-seq data from TCGA and microarray data from NCBI-GEO. Best performing model is evaluated by appropriate performance metrics (G-mean, accuracy, F-score etc.) and most important feature(s) will be extracted from this model using variable importance values.
URI
https://hibit2022.ims.metu.edu.tr
https://hdl.handle.net/11511/101351
Conference Name
The International Symposium on Health Informatics and Bioinformatics
Collections
Graduate School of Informatics, Conference / Seminar
Suggestions
OpenMETU
Core
Reducing Features to Improve Link Prediction Performance in Location Based Social Networks, Non-Monotonically Selected Subset from Feature Clusters
Bayrak, Ahmet Engin; Polat, Faruk (2019-01-01)
In most cases, feature sets available for machine learning algorithms require a feature engineering approach to pick the subset for optimal performance. During our link prediction research, we had observed the same challenge for features of Location Based Social Networks (LBSNs). We applied multiple reduction approaches to avoid performance issues caused by redundancy and relevance interactions between features. One of the approaches was the custom two-step method; starts with clustering features based on t...
Domain adaptation on graphs by learning graph topologies: theoretical analysis and an algorithm
Vural, Elif (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Traditional machine learning algorithms assume that the training and test data have the same distribution, while this assumption does not necessarily hold in real applications. Domain adaptation methods take into account the deviations in data distribution. In this work, we study the problem of domain adaptation on graphs. We consider a source graph and a target graph constructed with samples drawn from data manifolds. We study the problem of estimating the unknown class labels on the target graph using the...
Cross-modal Representation Learning with Nonlinear Dimensionality Reduction
KAYA, SEMİH; Vural, Elif (2019-08-22)
In many problems in machine learning there exist relations between data collections from different modalities. The purpose of multi-modal learning algorithms is to efficiently use the information present in different modalities when solving multi-modal retrieval problems. In this work, a multi-modal representation learning algorithm is proposed, which is based on nonlinear dimensionality reduction. Compared to linear dimensionality reduction methods, nonlinear methods provide more flexible representations e...
Domain Adaptation on Graphs via Frequency Analysis
Pilancı, Mehmet; Vural, Elif (2019-08-22)
Classical machine learning algorithms assume the training and test data to be sampled from the same distribution, while this assumption may be violated in practice. Domain adaptation methods aim to exploit the information available in a source domain in order to improve the performance of classification in a target domain. In this work, we focus on the problem of domain adaptation in graph settings. We consider a source graph with many labeled nodes and aim to estimate the class labels on a target graph wit...
MODELLING OF KERNEL MACHINES BY INFINITE AND SEMI-INFINITE PROGRAMMING
Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2009-06-03)
In Machine Learning (ML) algorithms, one of the crucial issues is the representation of the data. As the data become heterogeneous and large-scale, single kernel methods become insufficient to classify nonlinear data. The finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, we propose a novel method of "infinite" kernel combinations for learning problems with the help of infinite and semi-infinite programming regarding all elements in kernel space. Looking...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
N. Kızılilsoley, E. Tanıl, and E. Nikerel, “Cost-sensitive learning for rare subtype classfication of lung cancer,” Erdemli, Mersin, TÜRKİYE, 2022, p. 3111, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr.