Exploiting information extraction techniques for automatic semantic annotation and retrieval of news videos in Turkish

Küçük, Dilek
Information extraction (IE) is known to be an effective technique for automatic semantic indexing of news texts. In this study, we propose a text-based fully automated system for the semantic annotation and retrieval of news videos in Turkish which exploits several IE techniques on the video texts. The IE techniques employed by the system include named entity recognition, automatic hyperlinking, person entity extraction with coreference resolution, and event extraction. The system utilizes the outputs of the components implementing these IE techniques as the semantic annotations for the underlying news video archives. Apart from the IE components, the proposed system comprises a news video database in addition to components for news story segmentation, sliding text recognition, and semantic video retrieval. We also propose a semi-automatic counterpart of system where the only manual intervention takes place during text extraction. Both systems are executed on genuine video data sets consisting of videos broadcasted by Turkish Radio and Television Corporation. The current study is significant as it proposes the first fully automated system to facilitate semantic annotation and retrieval of news videos in Turkish, yet the proposed system and its semi-automated counterpart are quite generic and hence they could be customized to build similar systems for video archives in other languages as well. Moreover, IE research on Turkish texts is known to be rare and within the course of this study, we have proposed and implemented novel techniques for several IE tasks on Turkish texts. As an application example, we have demonstrated the utilization of the implemented IE components to facilitate multilingual video retrieval.


Selective word encoding for effective text representation
Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...
Automatic image annotation by ensemble of visual descriptors
Akbaş, Emre; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2006)
Automatic image annotation is the process of automatically producing words to de- scribe the content for a given image. It provides us with a natural means of semantic indexing for content based image retrieval. In this thesis, two novel automatic image annotation systems targeting dierent types of annotated data are proposed. The rst system, called Supervised Ensemble of Visual Descriptors (SEVD), is trained on a set of annotated images with predened class labels. Then, the system auto- matically annotates...
Exploiting interclass rules for focused crawling
Altıngövde, İsmail Sengör (Institute of Electrical and Electronics Engineers (IEEE), 2004-11-01)
A focused crawler gathers relevant Web pages on a particular topic. This rule-based Web-crawling approach uses linkage statistics among topics to improve. a baseline focused crawler's harvest rate and coverage.
Fusion of multimodal information for multimedia information retrieval
Yılmaz, Turgay; Yazıcı, Adnan; Department of Computer Engineering (2014)
An effective retrieval of multimedia data is based on its semantic content. In order to extract the semantic content, the nature of multimedia data should be analyzed carefully and the information contained should be used completely. Multimedia data usually has a complex structure containing multimodal information. Noise in the data, non-universality of any single modality, and performance upper bound of each modality make it hard to rely on a single modality. Thus, multimodal fusion is a practical approach...
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; ULUSOY, ÖZGÜR (Association for Computing Machinery (ACM), 2013-12-01)
Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work...
Citation Formats
D. Küçük, “Exploiting information extraction techniques for automatic semantic annotation and retrieval of news videos in Turkish,” Ph.D. - Doctoral Program, Middle East Technical University, 2011.