GPCRsort-Responding to the Next Generation Sequencing Data Challenge: Prediction of G Protein-Coupled Receptor Classes Using Only Structural Region Lengths

Sahın, Mehmet Emre
Can, Tolga
Son, Çağdaş Devrim
Next generation sequencing (NGS) and the attendant data deluge are increasingly impacting molecular life sciences research. Chief among the challenges and opportunities is to enhance our ability to classify molecular target data into meaningful and cohesive systematic nomenclature. In this vein, the G protein-coupled receptors (GPCRs) are the largest and most divergent receptor family that plays a crucial role in a host of pathophysiological pathways. For the pharmaceutical industry, GPCRs are a major drug target and it is estimated that 60%-70% of all medicines in development today target GPCRs. Hence, they require an efficient and rapid classification to group the members according to their functions. In addition to NGS and the Big Data challenge we currently face, an emerging number of orphan GPCRs further demand for novel, rapid, and accurate classification of the receptors since the current classification tools are inadequate and slow. This study presents the development of a new classification tool for GPCRs using the structural features derived from their primary sequences: GPCRsort. Comparison experiments with the current known GPCR classification techniques showed that GPCRsort is able to rapidly (in the order of minutes) classify uncharacterized GPCRs with 97.3% accuracy, whereas the best available technique's accuracy is 90.7%. GPCRsort is available in the public domain for postgenomics life scientists engaged in GPCR research with NGS:


Binary Classification Performance Measures/Metrics: A Comprehensive Visualized Roadmap to Gain New Insights
Canbek, Gurol; SAĞIROĞLU, Şeref; Taşkaya Temizel, Tuğba; Baykal, Nazife (2017-10-08)
Binary classification is one of the most frequent studies in applied machine learning problems in various domains, from medicine to biology to meteorology to malware analysis. Many researchers use some performance metrics in their classification studies to report their success. However, the literature has shown a widespread confusion about the terminology and ignorance of the fundamental aspects behind metrics. This paper clarifies the confusing terminology, suggests formal rules to distinguish between meas...
JOA: Joint Overlap Analysis of multiple genomic interval sets
Otlu, Burcak; Can, Tolga (Springer Science and Business Media LLC, 2019-03-08)
BackgroundNext-generation sequencing (NGS) technologies have produced large volumes of genomic data. One common operation on heterogeneous genomic data is genomic interval intersection. Most of the existing tools impose restrictions such as not allowing nested intervals or requiring intervals to be sorted when finding overlaps in two or more interval sets.ResultsWe proposed segment tree (ST) and indexed segment tree forest (ISTF) based solutions for intersection of multiple genomic interval sets in parallel...
Gokdogan, Gokhan; Vural, Elif (2017-09-28)
An important research topic of the recent years has been to understand and analyze manifold-modeled data for clustering and classification applications. Most clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate difficult sampling conditions only to some extent, and may fail for scarcely sampled data manifolds or at high-curvature regions. In this paper, w...
Security and Privacy Concerns Regarding Genetic Data in Mobile Health Record Systems: An Empirical Study from Turkey
Özkan, Özlem; Aydın Son, Yeşim; Aydınoğlu, Arsev Umur (2019-06-01)
With the increasing use of genetic testing and applications of bioinformatics in healthcare, genetic and genomic data needs to be integrated into electronic health systems. We administered a descriptive survey to 174 participants to elicit their views on the privacy and security of mobile health record systems and inclusion of their genetic data in these systems. A survey was implemented online and on site in two genetic diagnostic centres. Nearly half of the participants or their close family...
Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering
Carkacioglu, Levent; Atalay, Rengül; KONU KARAKAYALI, ÖZLEN; Atalay, Mehmet Volkan; Can, Tolga (2010-01-01)
Due to the increase in gene expression data sets in recent years, various data mining techniques have been proposed for mining gene expression profiles. However, most of these methods target single gene expression data sets and cannot handle all the available gene expression data in public databases in reasonable amount of time and space. In this paper, we propose a novel framework, bi-k-bi clustering, for finding association rules of gene pairs that can easily operate on large scale and multiple heterogene...
Citation Formats
M. E. Sahın, T. Can, and Ç. D. Son, “GPCRsort-Responding to the Next Generation Sequencing Data Challenge: Prediction of G Protein-Coupled Receptor Classes Using Only Structural Region Lengths,” OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, pp. 636–644, 2014, Accessed: 00, 2020. [Online]. Available: