Cross-lingual information retrieval on Turkish and English texts

Download

index.pdf

Date

2010

Author

Boynueğri, Akif

Metadata

Show full item record

Item Usage Stats

267
views

137
downloads

In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation service during translation of queries. In sense indexing based approach, documents are indexed according to meanings of words instead of words themselves. Retrieval of documents is performed according to meanings of the query words as well. During the identification of intended meaning of query terms, manual and automatic word sense disambiguation methods are used and compared to each other. Knowledge based WSD methods that use different gloss enrichment techniques are compared for Turkish words. Turkish WordNet is used as a primary knowledge base and English WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of words are more clearly identified by using semantic relations defined in WordNets and Turkish Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity metric is used as an alternative metric to word overlap count. Effects of using cosine similarity metric are observed for each WSD methods that use different knowledge bases.

Subject Keywords

Computer enginnering., Word sense.

URI

http://etd.lib.metu.edu.tr/upload/12611903/index.pdf
https://hdl.handle.net/11511/19598

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Türkiye ve New York (ABD) İlköğretim Matematik Öğretim Programlarının Karşılaştırılması İnce, Murat; Bilgin, Okan; Sarıcı, Hasan; Yıldırım, Mustafa (2018-05-04) Bu araştırmanın amacı, Türkiye ve New York’ta uygulanan ilköğretim matematik programlarını içerik, amaçları ve ölçme değerlendirmeaçısından ele alınarak programların benzerliklerini ve farklılıklarını ortaya koymaktır. Araştırmada tarama modeli kullanılmıştır. Karşılaştırmalıeğitim yaklaşımlarından yatay yaklaşımın benimsendiği araştırmada, doküman incelemesi yapılmıştır. Araştırmanın bulgularına göre; Türkiye’de matematik öğrenme sürecinde bilgiyi değerlendirmesinden ziyade öğrencinin bilgiyi kendisi için ...
A study on language modeling for Turkish large vocabulary continuous speech recognition Bayer, Ali Orkan; Turhan Yöndem, Meltem; Department of Computer Engineering (2005) This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based mo...
Pronominal anaphora resolution in Turkish and English Ertan, Melek; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2023-1-27) This research analyzes pronominal anaphora in a Turkish and English translated TED corpus, namely the TED-MDB (Zeyrek et al., 2020) and presents a heuristic-based resolution algorithm for resolving pronominal anaphora in these languages separately. The corpus has characteristics of spoken language and has 364 English sentences aligned with their Turkish counterparts. The research is divided into two stages. In the first stage, the data was annotated using a web-based annotation tool INcePTION (Klie et al., ...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates Tunaoglu, Doruk; Alan, Oezguer; Sabuncu, Orkunt; Akpinar, Samet; Cicekli, Nihan K.; Alpaslan, Ferda Nur (2009-09-16) In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind o...
Selective word encoding for effective text representation Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01) Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...

Citation Formats

A. Boynueğri, “Cross-lingual information retrieval on Turkish and English texts,” M.S. - Master of Science, Middle East Technical University, 2010.