Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon

Download
2021-8-31
Başıbüyük, Kezban
In this thesis, we developed methods for disambiguating the discourse usage and sense of connectives in a given free Turkish text. For this purpose, we firstly built a comprehensive Turkish Connective Lexicon (TCL) including all types of connectives in Turkish together with their syntactic and semantic features. This lexicon is built automatically by using the discourse relation annotations in several discourse annotated corpora developed for Turkish and follows the format of the German connective lexicon, DiMLex. As in many other languages, Turkish has lexical connectives (referred to as single and phrasal connectives in this work), and it also includes suffixal connectives. We developed a rule-based Turkish Connective Disambiguator (TCD) in order to solve the usage ambiguity of single, phrasal and suffixal connective types. Then, we designed machine learning models to disambiguate the discourse usage and sense of connectives. We evaluated the TCD and the machine learning models by comparing their results with the human annotations in the Turkish section of the TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1. We observed that the machine learning approach outperforms the baseline rule-based approach although both approaches yield quite good results. Within the scope of this thesis, we developed user-friendly interfaces for the TCL and TCD programs. The TCL program lists the discourse connectives in Turkish with their features and it presents several filtering and analysis capabilities. The TCD program, on the other hand, loads the selected free Turkish text to its interface and marks the discourse and non-discourse occurrences of connectives in the text. Additionally, if the selected file has a corresponding annotation file, the program automatically evaluates the disambiguation results. This thesis makes important contributions to Turkish discourse parsing by solving the usage ambiguity of the single and phrasal connectives as well as the suffixal connectives, which, to the best of our knowledge, has been attempted for the first time in this thesis. This thesis is also the first attempt to disambiguate the sense of all types of discourse connectives in Turkish. In this respect, it is predicted that the thesis would set baselines for future Turkish connective disambiguation works and pave the road for future researchers in the Turkish discourse parsing field.

Suggestions

Usage disambiguation of Turkish discourse connectives
Başıbüyük, Kezban; Zeyrek Bozşahin, Deniz (2023-01-01)
This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that largely correspond to subordinating conjunctions in English. Since these connectives have different linguistic characteristics, two sets of linguistic rules are devised to disambiguate their discourse usage. The linguistic rules are used in the rule-based approach and e...
The Discourse structure of Turkish
Demirşahin, Işın; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2015)
This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank,...
Linking discourse-level information and the induction of bilingual discourse connective lexicons
Özer, Sibel; Kurfall, Murathan; Zeyrek Bozşahin, Deniz; Mendes, Amália; Oleškevičiene, Giedre Valunaite (2022-6-20)
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consi...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (Springer, 2017-01-01)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼ 400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has ...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (2017-6-17)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼∼400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has of...
Citation Formats
K. Başıbüyük, “Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.