Improving search result clustering by integrating semantic information from Wikipedia

Download
2010
Çallı, Çağatay
Suffix Tree Clustering (STC) is a search result clustering (SRC) algorithm focused on generating overlapping clusters with meaningful labels in linear time. It showed the feasibility of SRC but in time, subsequent studies introduced description-first algorithms that generate better labels and achieve higher precision. Still, STC remained as the fastest SRC algorithm and there appeared studies concerned with different problems of STC. In this thesis, semantic relations between cluster labels and documents are exploited to filter out noisy labels and improve merging phase of STC. Wikipedia is used to identify these relations and methods for integrating semantic information to STC are suggested. Semantic features are shown to be effective for SRC task when used together with term frequency vectors. Furthermore, there were no SRC studies on Turkish up to now. In this thesis, a dataset for Turkish is introduced and a number of methods are tested on Turkish.

Suggestions

Semantically enriched web service composition in mobile environments
Ertürkmen, K. Alpay; Doğaç, Asuman; Department of Information Systems (2003)
Web Services are self-contained, self-describing, modular applications that can be published, located, and invoked through XML artefacts across the Web. Web services technologies can be applied to many kinds of applications, where they offer considerable advantages compared to the old world of product-specific APIs, platform-specific coding, and other أbrittleؤ technology restrictions. Currently there are millions of web services available on the web due to the increase in e-commerce business volume. Web se...
Local search versus Path Relinking in metaheuristics: Redesigning Meta-RaPS with application to the multidimensional knapsack problem
Arin, Arif; Rabadi, Ghaith (Elsevier BV, 2016-09-01)
Most heuristics for discrete optimization problems consist of two phases: a greedy-based construction phase followed by an improvement (local search) phase. Although the best solutions are usually generated after the improvement phase, there is usually a high computational cost for employing a local search algorithm. This paper seeks another alternative to reduce the computational burden of a local search while keeping solution quality by embedding intelligence in metaheuristics. A modified version of Path ...
Multi-resolution visualization of large scale protein networks enriched with gene ontology annotations
Yaşar, Sevgi; Can, Tolga; Department of Computer Engineering (2009)
Genome scale protein-protein interactions (PPIs) are interpreted as networks or graphs with thousands of nodes from the perspective of computer science. PPI networks represent various types of possible interactions among proteins or genes of a genome. PPI data is vital in protein function prediction since functions of the cells are performed by groups of proteins interacting with each other and main complexes of the cell are made of proteins interacting with each other. Recent increase in protein interactio...
Improving interactive classification of satellite image content
Tekkaya, Gökhan; Atalay, Mehmet Volkan; Department of Computer Engineering (2007)
Interactive classication is an attractive alternative and complementary for automatic classication of satellite image content, since the subject is visual and there are not yet powerful computational features corresponding to the sought visual features. In this study, we improve our previous attempt by building a more stable software system with better capabilities for interactive classication of the content of satellite images. The system allows user to indicate a few number of image regions that contain a...
Systematic component-oriented development with axiomatic design
Toğay, Cengiz; Doğru, Ali Hikmet; Department of Computer Engineering (2008)
In this research, component oriented development is supported with design guidance by extending the Axiomatic Design Theory for component orientation, and utilizing domain engineering and ontology mechanisms. Guidance is offered in the form of suggesting missing components and discovering incompatibilities among the candidate elements of software development, corresponding to different phases such as requirement analysis, design, and implementation. A mature domain concept is developed suggesting the availa...
Citation Formats
Ç. Çallı, “Improving search result clustering by integrating semantic information from Wikipedia,” M.S. - Master of Science, Middle East Technical University, 2010.