GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms

2022-11-01
DAĞ, OSMAN
KAŞIKCI ÇAVDAR, MERVE
İlk Dağ, Özlem
Yesiltepe, Metin
Selection of differentially expressed genes (DEGs) is a vital process to discover the causes of diseases. It has been shown that modelling of genomics data by considering relation among genes increases the predictive performance of methods compared to univariate analysis. However, there exist serious differences among most studies analyzing the same dataset for the reasons arising from the methods. Therefore, there is a strong need for easily accessible, user-friendly, and interactive tool to perform gene selection for RNA-seq data via machine learning algorithms simultaneously not to miss DEGs. We develop an open-source and freely available web-based tool for gene selection via machine learning algorithms that can deal with high performance computation. This tool includes six machine learning algorithms having different aspects. Moreover, the tool involves classical pre-processing steps; filtering, normalization, transformation, and univariate analysis. It also offers well-arranged graphical approaches; network plot, heatmap, venn diagram, and box-and-whisker plot. Gene ontology analysis is provided for both mRNA and miRNA DEGs. The implementation is carried out on Alzheimer RNA-seq data to demonstrate the use of this web-based tool. Eleven genes are suggested by at least two out of six methods. One of these genes, hsa-miR-148a-3p, might be considered as a new biomarker for Alzheimer's disease diagnosis. Kidney Chromophobe dataset is also analyzed to demonstrate the validity of GeneSelectML web tool on a different dataset. GeneSelectML is distinguished in that it simultaneously uses different machine learning algorithms for gene selection and can perform pre-processing, graphical representation, and gene ontology analyses on the same tool. This tool is freely available at www.softmed.hacettepe.edu.tr/ GeneSelectML.
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING

Suggestions

HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences
Doğan, Tunca (PeerJ, 2018-8-2)
Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relatio...
INTEGRATION OF MACHINE LEARNING AND ENTROPY METHODS FOR POST-GENOME-WIDE ASSOCIATION STUDIES ANALYSIS
Yaldız, Burcu; Aydın Son, Yeşim; Department of Medical Informatics (2022-8-31)
Non-linear relationships between genotypes play an essential role in understanding the genetic interactions of complex disease traits. Genome-Wide Association Studies (GWAS) have revealed a statistical association between the SNPs in many complex diseases. As GWAS results could not thoroughly explain the genetic background of these disorders, Genome-Wide Interaction Studies started to gain importance. In recent years, various statistical approaches such as entropy-based methods have been suggested for revea...
IDENTIFYING ISOFORM SWITCHES IN BREAST CANCER
HENDEN, Şevki Onur; Can, Tolga; Department of Computer Engineering (2021-9-9)
Characterizing the human genome's molecular functions and their variations across people is vital for understanding the cellular processes behind human genetic characteristics and diseases. With the advent of single-cell RNA sequencing (scRNA-seq), it is now possible to investigate gene expression in individual cells. Although a number of scRNA-seq bioinformatics tools are now available, many of them focus on overall gene expression levels and, as a result, often ignore heterogeneity caused by individual tr...
Genome-Scale Networks Link Neurodegenerative Disease Genes to α-Synuclein through Specific Molecular Pathways.
KHURANA, V; et. al. (2017-02-22)
Numerous genes and molecular pathways are implicated in neurodegenerative proteinopathies, but their inter-relationships are poorly understood. We systematically mapped molecular pathways underlying the toxicity of alpha-synuclein (alpha-syn), a protein central to Parkinson's disease. Genome-wide screens in yeast identified 332 genes that impact alpha-syn toxicity. To "humanize'' this molecular network, we developed a computational method, Transpose Net. This integrates a Steiner prize-collecting approach w...
Genes associated with resistance to wheat yellow rust disease identified by differential display analysis
Bozkurt, Osman; Unver, Turgay; Akkaya, Mahinur (2007-10-01)
This study is focused on the identification of early response genes involved in the resistance mechanism of one of the most important diseases of wheat, yellow rust. The strategy undertaken was to use differential display reverse transcriptase-PCR method (DDRT-PCR) on two of the yellow rust differential lines of wheat, were infected with the virulent and the avirulent Puccinia striiformis f. sp. tritici races together with appropriate control inoculations. Upon infection of the plant materials (resistant an...
Citation Formats
O. DAĞ, M. KAŞIKCI ÇAVDAR, Ö. İlk Dağ, and M. Yesiltepe, “GeneSelectML: a comprehensive way of gene selection for RNA-Seq data via machine learning algorithms,” MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, pp. 0–0, 2022, Accessed: 00, 2023. [Online]. Available: https://hdl.handle.net/11511/101476.