Learning-based robust sample selection to reduce noise in high dimensional transcriptome data

Download

HIBIT22_paper_109.pdf

Date

2022-10

Author

Kızılilsoley, Nehir
Tanıl, Ezgi
Nikerel, Emrah

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

56
views

15
downloads

To reduce inherent noise in high dimensional transcriptome data from a lung cancer cohort, a learning based sub-sample selection approach is adopted. Focusing on consensus clustering analysis, TCGA network data on lung cancer reached its maximum cluster stability when divided into three, which matches with the number of actual groups (adenocarcinoma, squamous cell carcinoma and normal). Using silhouette width as well as naive inspection of clustering performance to filter out samples, 840 out of 1145 samples were selected as core samples. The contribution of using consensus clustering analysis as a sample selection method was assessed by comparing the subtype classification accuracies of informative genes discovered from the “initial” set (1145 samples), “reduced” set (901 samples) and core set (840 samples). The list of candidate markers obtained from initial samples and core samples were similar, with a great increase in the prediction accuracy. Taken together, the results suggest that learning based sample selection can aid in sample filtering while retaining most of the information and reducing the noise.

URI

https://hibit2022.ims.metu.edu.tr
https://hdl.handle.net/11511/101354

Conference Name

The International Symposium on Health Informatics and Bioinformatics

Collections

Graduate School of Informatics, Conference / Seminar

Suggestions

OpenMETU
Core

Meta analysis of alzheimer’s disease at the gene expression level İzgi, Hamit; Somel, Mehmet; Department of Biology (2017) In this study, publicly available microarray gene expression datasets are used to investigate common gene expression changes in different postmortem brain regions in Alzheimer’s Disease (AD) patients compared to control subjects, and to find possible functional associations related to these changes. The hypothesis is that pathogenesis of the disease converges into common patterns of dysregulation/alteration or dysfunction in molecular pathways across different brain regions in AD. In total, I studied 13 dat...
Localization and Identification of structural nonlinearities using neural networks Yumer, Mehmet Ersin; Koyuncu, Anıl; Ciğeroğlu, Ender; Özgüven, Hasan Nevzat (2013-02-14) In this study, a new approach is proposed for identification of structural nonlinearities by employing neural networks. Linear finite element model of the system and frequency response functions measured at arbitrary locations of the system are used in this approach. Using the finite element model, a training data set is created, which appropriately spans the possible nonlinear configurations space of the system. A classification neural network trained on these data sets then localizes and determines the ty...
Model comparison for gynecological cancer datasets and selection of threshold value Bahçivancı, Başak; Purutçuoğlu Gazi, Vilda; Department of Statistics (2019) Cancer is a very common system’s disease with its structural and functional complexities caused by high dimension and serious correlation of genes as well as sparsity of gene interactions. Hereby, different mathematical models have been suggested in the literature to unravel these challenges. Among many alternates, in this study we use the Gaussian graphical model, Gaussian copula graphical model and loop-based multivariate adaptive regression splines with/without interaction models due to their advantages ...
Learning functional properties of proteins with language models Unsal, Serbulent; Atas, Heval; ALBAYRAK, MUAMMER; TURHAN, KEMAL; Acar, Aybar Can; DOĞAN, TUNCA (2022-03-01) Data-centric approaches have been used to develop predictive methods for elucidating uncharacterized properties of proteins; however, studies indicate that these methods should be further improved to effectively solve critical problems in biomedicine and biotechnology, which can be achieved by better representing the data at hand. Novel data representation approaches mostly take inspiration from language models that have yielded ground-breaking improvements in natural language processing. Lately, these appr...
Forward problem solution for electrical conductivity imaging via contactless measurements Gençer, Nevzat Güneri (IOP Publishing, 1999-04-01) The forward problem of anew medical imaging system is analysed in this study. This system uses magnetic excitation to induce currents inside a conductive body and measures the magnetic fields of the induced currents. The forward problem, that is determining induced currents in the conductive body and their magnetic fields, is formulated. For a general solution of the forward problem, the finite element method (FEM) is employed to evaluate the scalar potential distribution. Thus, inhomogeneity and anisotropy...

Citation Formats

N. Kızılilsoley, E. Tanıl, and E. Nikerel, “Learning-based robust sample selection to reduce noise in high dimensional transcriptome data,” Erdemli, Mersin, TÜRKİYE, 2022, p. 3109, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr.