Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Div-BLAST: Diversification of Sequence Search Results
Download
journal.pone.0115445.PDF
Date
2014-12-22
Author
Eser, Elif
Can, Tolga
Ferhatosmanoglu, Hakan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License
.
Item Usage Stats
367
views
135
downloads
Cite This
Sequence similarity tools, such as BLAST, seek sequences most similar to a query from a database of sequences. They return results significantly similar to the query sequence and that are typically highly similar to each other. Most sequence analysis tasks in bioinformatics require an exploratory approach, where the initial results guide the user to new searches. However, diversity has not yet been considered an integral component of sequence search tools for this discipline. Some redundancy can be avoided by introducing non-redundancy during database construction, but it is not feasible to dynamically set a level of non-redundancy tailored to a query sequence. We introduce the problem of diverse search and browsing in sequence databases that produce non-redundant results optimized for any given query. We define diversity measures for sequences and propose methods to obtain diverse results extracted from current sequence similarity search tools. We also propose a new measure to evaluate the diversity of a set of sequences that is returned as a result of a sequence similarity query. We evaluate the effectiveness of the proposed methods in post-processing BLAST and PSI-BLAST results. We also assess the functional diversity of the returned results based on available Gene Ontology annotations. Additionally, we include a comparison with a current redundancy elimination tool, CD-HIT. Our experiments show that the proposed methods are able to achieve more diverse yet significant result sets compared to static non-redundancy approaches. In both sequence-based and functional diversity evaluation, the proposed diversification methods significantly outperform original BLAST results and other baselines. A web based tool implementing the proposed methods, Div-BLAST, can be accessed at cedar.cs.bilkent.edu.tr/Div-BLAST
Subject Keywords
PROTEIN
,
DATABASE
,
DIVERSITY
,
ENTROPY
,
TOOL
URI
https://hdl.handle.net/11511/28647
Journal
Plos One
DOI
https://doi.org/10.1371/journal.pone.0115445
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
mESAdb: microRNA expression and sequence analysis database.
Kaya, KD; Karakülah, G; Yakicier, CM; Acar, Aybar Can; Konu, O (2011-01-01)
MicroRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse a...
Fuzzy data representation and querying in XML database
Ustunkaya, Ekin; Yazıcı, Adnan; George, Roy (2007-02-01)
Real-world information including subjective opinions and judgments need imprecise data to be modeled for representation and querying in databases. The Extensible Markup Language (XML) has become a de-facto standard for data modeling and exchange in recent years. Efforts on modeling imprecision and representing such data in XML have not been fully developed. In this paper, an XML based fuzzy data representation and querying system is presented. Complex and imprecise data are represented using a fuzzy extensi...
An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms
Aksoy, Ece; Işık, Oğuz; Department of Geodetic and Geographical Information Technologies (2004)
There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classific...
Image Annotation With Semi-Supervised Clustering
Sayar, Ahmet; Yarman Vural, Fatoş Tunay (2009-09-16)
Methods developed for image annotation usually make use of region clustering algorithms. Visual codebooks are generated from the region clusters of low level features. These codebooks are then, matched with the words of the text document related to the image, in various ways. In this paper, we supervise the clustering process by using three types of side information. The first one is the topic probability information obtained from the text document associated with the image. The second is the orientation an...
Joint linear complexity of multisequences consisting of linear recurring sequences
Fu, Fang-Wei; Niederreiter, Harald; Özbudak, Ferruh (Springer Science and Business Media LLC, 2009-04-01)
The linear complexity of sequences is one of the important security measures for stream cipher systems. Recently, in the study of vectorized stream cipher systems, the joint linear complexity of multisequences has been investigated. In this paper, we study the joint linear complexity of multisequences consisting of linear recurring sequences. The expectation and variance of the joint linear complexity of random multisequences consisting of linear recurring sequences are determined. These results extend the ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
E. Eser, T. Can, and H. Ferhatosmanoglu, “Div-BLAST: Diversification of Sequence Search Results,”
Plos One
, 2014, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/28647.