Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Cross-lingual information retrieval on Turkish and English texts
Download
index.pdf
Date
2010
Author
Boynueğri, Akif
Metadata
Show full item record
Item Usage Stats
238
views
122
downloads
Cite This
In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation service during translation of queries. In sense indexing based approach, documents are indexed according to meanings of words instead of words themselves. Retrieval of documents is performed according to meanings of the query words as well. During the identification of intended meaning of query terms, manual and automatic word sense disambiguation methods are used and compared to each other. Knowledge based WSD methods that use different gloss enrichment techniques are compared for Turkish words. Turkish WordNet is used as a primary knowledge base and English WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of words are more clearly identified by using semantic relations defined in WordNets and Turkish Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity metric is used as an alternative metric to word overlap count. Effects of using cosine similarity metric are observed for each WSD methods that use different knowledge bases.
Subject Keywords
Computer enginnering.
,
Word sense.
URI
http://etd.lib.metu.edu.tr/upload/12611903/index.pdf
https://hdl.handle.net/11511/19598
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Türkiye ve New York (ABD) İlköğretim Matematik Öğretim Programlarının Karşılaştırılması
İnce, Murat; Bilgin, Okan; Sarıcı, Hasan; Yıldırım, Mustafa (2018-05-04)
Bu araştırmanın amacı, Türkiye ve New York’ta uygulanan ilköğretim matematik programlarını içerik, amaçları ve ölçme değerlendirmeaçısından ele alınarak programların benzerliklerini ve farklılıklarını ortaya koymaktır. Araştırmada tarama modeli kullanılmıştır. Karşılaştırmalıeğitim yaklaşımlarından yatay yaklaşımın benimsendiği araştırmada, doküman incelemesi yapılmıştır. Araştırmanın bulgularına göre; Türkiye’de matematik öğrenme sürecinde bilgiyi değerlendirmesinden ziyade öğrencinin bilgiyi kendisi için ...
A study on language modeling for Turkish large vocabulary continuous speech recognition
Bayer, Ali Orkan; Turhan Yöndem, Meltem; Department of Computer Engineering (2005)
This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based mo...
Pronominal anaphora resolution in Turkish and English
Ertan, Melek; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2023-1-27)
This research analyzes pronominal anaphora in a Turkish and English translated TED corpus, namely the TED-MDB (Zeyrek et al., 2020) and presents a heuristic-based resolution algorithm for resolving pronominal anaphora in these languages separately. The corpus has characteristics of spoken language and has 364 English sentences aligned with their Turkish counterparts. The research is divided into two stages. In the first stage, the data was annotated using a web-based annotation tool INcePTION (Klie et al., ...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Tunaoglu, Doruk; Alan, Oezguer; Sabuncu, Orkunt; Akpinar, Samet; Cicekli, Nihan K.; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind o...
Selective word encoding for effective text representation
Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. Boynueğri, “Cross-lingual information retrieval on Turkish and English texts,” M.S. - Master of Science, Middle East Technical University, 2010.