Semantic search on Turkish news domain with automatic query expansion

Download
2016
Demir, Tuğba
In this thesis, semantic search on Turkish news domain with query expansion is proposed. Our aim is to provide the user with the most relevant documents related to their entered keywords. Our system uses data sources from Turkish news websites such as Hürriyet, Milliyet, Sabah, etc. Our system extends the user’s query with word embeddings and semantic relatedness. Furthermore, named entities, containing precious information, are extracted from news sources and user query and ranked to return on top of the results. In the rest of search process, it relies on traditional information retrieval (IR) techniques. For Turkish language, to the best of our knowledge, our system is the first attempt to use such search and query extension techniques on news data.

Suggestions

Semantic information-based alternative plan generation for multiple query optimization
Polat, Faruk; Alhajj, R (Elsevier BV, 2001-09-01)
This paper addresses the impact of semantic information about queries on alternative plan generation (APG) for multiple query optimization (MQO). MQO covers optimizing the execution of a set of queries together where each query in the set to be optimized has several alternative execution plans. A multiple query optimizer selects an alternative plan for each query to obtain an optimal global execution plan. Our approach uses information such as common relations, common possible joins and common conditions to...
Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution
Engin, Melih; Can, Tolga (2009-09-16)
In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the ...
Multimedia data modeling and semantic analysis by multimodal decision fusion
Güder, Mennan; Çiçekli, Fehime Nihan; Department of Computer Engineering (2015)
In this thesis, we propose a multi-modal event recognition framework based on the integration of event modeling, fusion, deep learning and, association rule mining. Event modeling is achieved through visual concept learning, scene segmentation and association rule mining. Visual concept learning is employed to reveal the semantic gap between the visual content and the textual descriptors of the events. Association rules are discovered by a specialized association rule mining algorithm where the proposed str...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Doruk, Tunaoğlu; Alan, Özgür; Orkunt, Sabuncu; Samet, Akpınar; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach. We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently. We applied our approach to primarily Turkish unstructured soccer Web-casting texts. Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kin...
Natural language query processing in ontology based multimedia databases
Aygül, Filiz Alaca; Çiçekli, Fehime Nihan; Department of Computer Engineering (2010)
In this thesis a natural language query interface is developed for semantic and spatio-temporal querying of MPEG-7 based domain ontologies. The underlying ontology is created by attaching domain ontologies to the core Rhizomik MPEG-7 ontology. The user can pose concept, complex concept (objects connected with an “AND” or “OR” connector), spatial (left, right . . . ), temporal (before, after, at least 10 minutes before, 5 minutes after . . . ), object trajectory and directional trajectory (east, west, southe...
Citation Formats
T. Demir, “Semantic search on Turkish news domain with automatic query expansion,” M.S. - Master of Science, Middle East Technical University, 2016.