Identifying the effectiveness of a web search engine with Turkish domain dependent impacts and global scale information retrieval improvements

Download
2012
Fidan, Güven
This study investigates the effectiveness of a Web search engine with newly added or improved features in Web search engine architecture. These features can be categorized into three groups: The impact of link quality and usage information on page importance calculation; the use of Turkish stemmer for indexing and query substitution; and, the use of thumbnails for Web search engine result visualization. As Web search engines have become the primary means for finding and accessing information on the Internet, the effectiveness of Web search engines should be evaluated on the idea of how effectively and efficiently they assist users achieve a query, which defines performance criteria rather than the pure precision and recall measures developed among basic information retrieval roles. In this thesis, we propose three distinguishing features to increase the efficiency of a Web search engine: The impact of link quality and usage information on page importance calculation outperforms classical hyperlink graph based methods notably, such as PageRank. The use of the Turkish stemmer for indexing and query substitution has remarkable improvements on Web relevance when used in a mixed framework with normal and stemmed forms. Finally, we have observed that users are able to find the most relevant results by using webpage thumbnails in the queries with decreased precision score values, despite their preferred search engine gazing behavior is much attributed.

Suggestions

Advanced methods for result and score caching in web search engines
Yafay, Erman.; Altıngövde, İsmail Sengör; Department of Computer Engineering (2019)
Search engines employ caching techniques in main memory to improve system efficiency and scalability. In this thesis, we focus on improving the cache performance for web search engines where our contributions can be separated into two main parts. Firstly, we investigate the impact of the sample size for frequency statistics for most popular cache eviction strategies in the literature, and show that cache performance improves with larger samples, i.e., by storing the frequencies of all (or, most of) the quer...
Improving educational web search for question-like queries through subject classification
Yilmaz, Tolga; Ozcan, Rifat; Altıngövde, İsmail Sengör; ULUSOY, ÖZGÜR (2019-01-01)
Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these ...
Explicit Search Result Diversification Using Score and Rank Aggregation Methods
Ozdemiray, Ahmet Murat; Altıngövde, İsmail Sengör (2015-06-01)
Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel ...
A new approach for reactive web usage data processing
Bayir, Murat Ali; Toroslu, İsmail Hakkı; Coşar, Ahmet (2006-01-01)
© 2006 IEEE.Web usage mining exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web (WWW) users. The required information is captured by web servers and stored in web usage data logs. The first phase of web usage mining is the data processing phase. In the data processing phase, first, relevant information is filtered from the logs. After that, sessions are reconstructed by using heuristics that select and group requests belonging to the same user session...
Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines
Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Ulusoy, Ozgur (2011-01-01)
Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result pages computed for queries. The docID cache in the second layer stores ids of documents in search results. Experiments under various scenarios show that, in terms of average query processing time, this hybrid caching approach outperforms the traditional approac...
Citation Formats
G. Fidan, “Identifying the effectiveness of a web search engine with Turkish domain dependent impacts and global scale information retrieval improvements,” Ph.D. - Doctoral Program, Middle East Technical University, 2012.