Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Unsupervised identification of redundant domain entries in InterPro database using clustering techniques
Date
2015-09-12
Author
Rifaioğlu, Ahmet Süreyya
Can, Tolga
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
188
views
0
downloads
Cite This
InterPro is a widely used database that integrates functional signatures provided by different protein sequence annotation databases with manual curation; in order to present a comprehensive database of functional sequence annotation. However, the integration of the signatures causes inconsistent and/or redundant annotations in some cases. In this study, we proposed an unsupervised method for the automatic detection of inconsistent and redundant entries in the InterPro database. Two clustering methods: Markov Cluster Algorithm (MCL) and hierarchical clustering are employed in order to investigate to what extent these signatures can be detected. Results show that a considerable amount of (~75%) redundant entries can be identified. The future goal is to develop a system that does the identification of redundant and inconsistent signatures with very high performance using machine learning techniques in a supervised fashion. The findings of the study may aid InterPro curators to fix the problematic entries. It may also be used by curators as a road map before the integration of new signatures.
Subject Keywords
Applied computing
,
Human-centered computing
,
Computing methodologies
,
Mathematics of computing
,
Theory of computation
,
Markov processes
,
Markov decision processes
URI
https://hdl.handle.net/11511/31766
DOI
https://doi.org/10.1145/2808719.2811430
Collections
Graduate School of Natural and Applied Sciences, Conference / Seminar
Suggestions
OpenMETU
Core
Derivation of Transcriptional Regulatory Relationships by Partial Least Squares Regression
Tan, Mehmet; Polat, Faruk; Alhajj, Reda (2009-11-04)
As the number of genes in a transcriptional regulatory network is large and the number of samples in biological data types is usually small, there is a need for integrating multiple data types for reverse engineering these networks. In this paper, we propose a method to integrate microarray gene expression, ChIP-chip and transcription factor binding motif data sets in a partial least squares regression model to derive transcription factors (TFs) gene interactions. Both single and synergistic effects of TFs ...
Multiobjective relational data warehouse design for the cloud
Dökeroğlu, Tansel; Coşar, Ahmet; Department of Computer Engineering (2014)
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robu...
Semantic concept recognition from structured and unstructured inputs within cyber security domain
Hoşsucu, Alp Gökhan; Baykal, Nazife; Department of Information Systems (2015)
Linked data initiative has been quite successful in terms of publishing and interlinking data over ontological structures. The success is due to answering semantically rich queries over highly structured data. The utilization of linked data structures are widely used in various domains to solve the problem of producing domain specific knowledge which can be interpreted by automated agents without any human interference. Cyber security field is one of the domains that suffer from the excessiveness of the raw...
Image Annotation With Semi-Supervised Clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay (2009-09-16)
Methods developed for image annotation usually make use of region clustering algorithms. Visual codebooks are generated from the region clusters of low level features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this paper, we supervise the clustering process by using three types of side information. The first one is the topic probability information obtained from the text document associated with the image. The second is the orientation an...
Flexible Content Extraction and Querying for Videos
Demir, Utku; KOYUNCU, Murat; Yazıcı, Adnan; Yilmaz, Turgay; SERT, MUSTAFA (2011-10-28)
In this study, a multimedia database system which includes a semantic content extractor, a high-dimensional index structure and an intelligent fuzzy object-oriented database component is proposed. The proposed system is realized by following a component-oriented approach. It supports different flexible query capabilities for the requirements of video users, which is the main focus of this paper. The query performance of the system (including automatic semantic content extraction) is tested and analyzed in t...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. S. Rifaioğlu and T. Can, “Unsupervised identification of redundant domain entries in InterPro database using clustering techniques,” 2015, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31766.