Cross-lingual information retrieval on Turkish and English texts

Download
2010
Boynueğri, Akif
In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation service during translation of queries. In sense indexing based approach, documents are indexed according to meanings of words instead of words themselves. Retrieval of documents is performed according to meanings of the query words as well. During the identification of intended meaning of query terms, manual and automatic word sense disambiguation methods are used and compared to each other. Knowledge based WSD methods that use different gloss enrichment techniques are compared for Turkish words. Turkish WordNet is used as a primary knowledge base and English WordNet and Turkish Wikipedia are employed as enrichment resources. Meanings of words are more clearly identified by using semantic relations defined in WordNets and Turkish Wikipedia. Also, during calculation of semantic relatedness of senses, cosine similarity metric is used as an alternative metric to word overlap count. Effects of using cosine similarity metric are observed for each WSD methods that use different knowledge bases.

Suggestions

Türkiye ve New York (ABD) İlköğretim Matematik Öğretim Programlarının Karşılaştırılması
İnce, Murat; Bilgin, Okan; Sarıcı, Hasan; Yıldırım, Mustafa (2018-05-04)
Bu araştırmanın amacı, Türkiye ve New York’ta uygulanan ilköğretim matematik programlarını içerik, amaçları ve ölçme değerlendirmeaçısından ele alınarak programların benzerliklerini ve farklılıklarını ortaya koymaktır. Araştırmada tarama modeli kullanılmıştır. Karşılaştırmalıeğitim yaklaşımlarından yatay yaklaşımın benimsendiği araştırmada, doküman incelemesi yapılmıştır. Araştırmanın bulgularına göre; Türkiye’de matematik öğrenme sürecinde bilgiyi değerlendirmesinden ziyade öğrencinin bilgiyi kendisi için ...
Improvement of corpus-based semantic word similarity using vector space model
Esin, Yunus Emre; Alpaslan, Ferda Nur; Department of Computer Engineering (2009)
This study presents a new approach for finding semantically similar words from corpora using window based context methods. Previous studies mainly concentrate on either finding new combination of distance-weight measurement methods or proposing new context methods. The main di fference of this new approach is that this study reprocesses the outputs of the existing methods to update the representation of related word vectors used for measuring semantic distance between words, to improve the results further. ...
Natural language query processing in ontology based multimedia databases
Aygül, Filiz Alaca; Çiçekli, Fehime Nihan; Department of Computer Engineering (2010)
In this thesis a natural language query interface is developed for semantic and spatio-temporal querying of MPEG-7 based domain ontologies. The underlying ontology is created by attaching domain ontologies to the core Rhizomik MPEG-7 ontology. The user can pose concept, complex concept (objects connected with an “AND” or “OR” connector), spatial (left, right . . . ), temporal (before, after, at least 10 minutes before, 5 minutes after . . . ), object trajectory and directional trajectory (east, west, southe...
A study on language modeling for Turkish large vocabulary continuous speech recognition
Bayer, Ali Orkan; Turhan Yöndem, Meltem; Department of Computer Engineering (2005)
This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based mo...
Selective word encoding for effective text representation
Ozkan, Savas; Ozkan, Akin (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, w...
Citation Formats
A. Boynueğri, “Cross-lingual information retrieval on Turkish and English texts,” M.S. - Master of Science, Middle East Technical University, 2010.