Usage disambiguation of Turkish discourse connectives

Başıbüyük, Kezban
Zeyrek Bozşahin, Deniz
This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that largely correspond to subordinating conjunctions in English. Since these connectives have different linguistic characteristics, two sets of linguistic rules are devised to disambiguate their discourse usage. The linguistic rules are used in the rule-based approach and employed as feature sets in the machine learning approach to test whether they influenced the decision of our algorithms. The results of both approaches are evaluated over the Turkish section of TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1, two datasets annotated in the Penn Discourse TreeBank style. The paper attests to the predictive power of the linguistic rules in disambiguating the discourse usage of both types of connectives also offering new knowledge and insights for discourse processing from the view of a morphologically rich language.
Language Resources and Evaluation


Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon
Başıbüyük, Kezban; Doğru, Ali Hikmet; Zeyrek Bozşahin, Deniz; Department of Computer Engineering (2021-8-31)
In this thesis, we developed methods for disambiguating the discourse usage and sense of connectives in a given free Turkish text. For this purpose, we firstly built a comprehensive Turkish Connective Lexicon (TCL) including all types of connectives in Turkish together with their syntactic and semantic features. This lexicon is built automatically by using the discourse relation annotations in several discourse annotated corpora developed for Turkish and follows the format of the German connective lexicon, ...
Linking discourse-level information and the induction of bilingual discourse connective lexicons
Özer, Sibel; Kurfall, Murathan; Zeyrek Bozşahin, Deniz; Mendes, Amália; Oleškevičiene, Giedre Valunaite (2022-6-20)
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consi...
Syntax/semantics/pragmatics of yes/no questions in second language Turkish
Gracanın Yüksek, Martına; Kırkıcı, Bilal (John Benjamins, 2016-01-01)
This paper examines the second language (L2) acquisition of the syntax, semantics, and pragmatics of Turkish yes/no (yn) questions across learners of different proficiency. Employing three tasks, we tested the participants’ knowledge of the syntax of yn questions, the semantic interpretations they assign to the construction, and their mastery of the pragmatic constraints that govern the use of various points of yn questions. Although the results show that the participants’ performance in all three areas we ...
Identification of Discourse Relations in Turkish Discourse Bank
Kutlu, Ferhat; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2023-1-25)
Discourse is the level of language where linguistic units are organized in a structured and coherent way. One of the major problems in the field of discourse in particular, and NLU in general is how to build better models to sense the way constitutive units of discourse stick together to form a coherent whole. The discourse would be coherent if it had meaningful connections between its parts. Discourse relations, i.e., semantic or pragmatic relations between discourse units (clauses or sentences), are one o...
Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems
Tatlıcıoğlu, Mehmet; Yazıcı, Adnan; Department of Computer Engineering (2013)
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decompositio...
Citation Formats
K. Başıbüyük and D. Zeyrek Bozşahin, “Usage disambiguation of Turkish discourse connectives,” Language Resources and Evaluation, pp. 0–0, 2023, Accessed: 00, 2023. [Online]. Available: