Improving document ranking with query expansion based on bert word embeddings

Yeke, Doğuhan
In this thesis, we present a query expansion approach based on contextualized word embeddings for improving document ranking performance. We employ Bidirectional Encoder Representations from Transformers(BERT) word embeddings to expand the original query with semantically similar terms. After deciding the best method for extracting word embeddings from BERT, we extend our query with the best candidate terms. As our primary goal, we show how BERT performs over the Word2Vec model, known as the most common procedure for representing terms in the vector space. After that, by leveraging the relevance judgment list, we show positive contributions of integrating tf-idf and term co-occurrence properties of terms to our query expansion system. Our experiments demonstrate that BERT outperforms Word2Vec in well-known evaluation metrics. In addition, we also conduct several experiments that address the most popular issues in information retrieval systems.


Using object-oriented materialized views to answer selection-based complex queries
Alhajj, R; Polat, Faruk (1999-09-01)
Presented in this paper is a model that utilizes existing materialized views to handle a wide range of complex selection-based queries, including linear recursive queries. Such queries are complex because it is almost impossible for naive users to predict the formulation of their predicate expressions. Object variables bound to objects in the result of a query are allowed to appear in the predicate of that query. Also, the predicate definition is extended to make it possible to have in the output only a sub...
Improving the performance of Hadoop/Hive by sharing scan and computation tasks
Özal, Serkan; Toroslu, İsmail Hakkı; Doğaç, Asuman; Department of Computer Engineering (2013)
MapReduce is a popular model of executing time-consuming analytical queries as a batch of tasks on large scale data. During simultaneous execution of multiple queries, many oppor- tunities can arise for sharing scan and/or computation tasks. Executing common tasks only once can reduce the total execution time of all queries remarkably. Therefore, we propose to use Multiple Query Optimization (MQO) techniques to improve the overall performance of Hadoop Hive, an open source SQL-based distributed warehouse sy...
Semantic information-based alternative plan generation for multiple query optimization
Polat, Faruk; Alhajj, R (Elsevier BV, 2001-09-01)
This paper addresses the impact of semantic information about queries on alternative plan generation (APG) for multiple query optimization (MQO). MQO covers optimizing the execution of a set of queries together where each query in the set to be optimized has several alternative execution plans. A multiple query optimizer selects an alternative plan for each query to obtain an optimal global execution plan. Our approach uses information such as common relations, common possible joins and common conditions to...
A transcoding robust data hiding method for image communication applications
Candan, Çağatay (2005-09-14)
We present a data embedding method for image communication applications. Our goal is to implement novel multimedia applications such as multi-language captions, interactive programming and title specific features over the existing image communication channel. To this aim, we present a data embedding method for JPEG images which has the desired degree of robustness to transcoding or bitrate adjustments that may take place in the communication channel. The described system is designed for JPEG images but can ...
Analyzing the polarity of opinionated queries
Chelaru, Sergiu; Altıngövde, İsmail Sengör; Siersdorfer, Stefan (2012-04-27)
In this paper, we present an in-depth analysis of Web search queries for controversial topics, focusing on query sentiment. To this end, we conduct extensive user assessments as well as an automatic sentiment analysis using the SentiWordNet thesaurus.
Citation Formats
D. Yeke, “Improving document ranking with query expansion based on bert word embeddings,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2020.