Pronominal anaphora resolution in Turkish and English

2023-1-27
Ertan, Melek
This research analyzes pronominal anaphora in a Turkish and English translated TED corpus, namely the TED-MDB (Zeyrek et al., 2020) and presents a heuristic-based resolution algorithm for resolving pronominal anaphora in these languages separately. The corpus has characteristics of spoken language and has 364 English sentences aligned with their Turkish counterparts. The research is divided into two stages. In the first stage, the data was annotated using a web-based annotation tool INcePTION (Klie et al., 2018). The second phase of the study involves a computational analysis, where the traditional knowledge poor algorithm by Mitkov (1998) was tested on the annotated corpus for Turkish and English separately. The results showed that pronom- inal anaphora can be detected in TED talks with an F1-score of 0.61 in English, and with 0.63 in their Turkish translations.

Suggestions

Incremental processing in head-final child language: online comprehension of relative clauses in Turkish-speaking children and adults
Özge, Duygu; Zeyrek Bozşahin, Deniz (2015-10-21)
The present study investigates the parsing of pre-nominal relative clauses (RCs) in children for the first time with a real-time methodology that reveals moment-to-moment processing patterns as the sentence unfolds. A self-paced listening experiment with Turkish-speaking children (aged 5-8) and adults showed that both groups display a sign of processing cost both in subject and object RCs at different points through the flow of the utterance when integrating the cues that are uninformative (i.e., ambiguous ...
Cross-lingual information retrieval on Turkish and English texts
Boynueğri, Akif; Birtürk, Ayşe Nur; Department of Computer Engineering (2010)
In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation ser...
Head finalization and morphological analysis in factored phrase-based statistical machine translation from English to Turkish
İmren, Haydar; Çakıcı, Ruket; Department of Computer Engineering (2015)
Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of par...
Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems
Tatlıcıoğlu, Mehmet; Yazıcı, Adnan; Department of Computer Engineering (2013)
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decompositio...
Backchannels in spoken Turkish
Aytaç Demirçivi, Kadriye; Işık Güler, Hale; Department of English Language Teaching (2021-4-01)
This study aims to identify all the non-lexical and lexical backchannels and different functions carried out by these backchannels in the Spoken Turkish Corpus. It also aims to investigate differences in the use of backchannels in naturally formed groups in the data. In order to achieve these aims, Spoken Turkish Corpus was used as the data source and EXMaRALDA tools were used to annotate functions of the backchannels. A sub-corpus was formed consisting of 61 conversations from three main settings:conversat...
Citation Formats
M. Ertan, “Pronominal anaphora resolution in Turkish and English,” M.S. - Master of Science, Middle East Technical University, 2023.