Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon

Download
2021-8-31
Başıbüyük, Kezban
In this thesis, we developed methods for disambiguating the discourse usage and sense of connectives in a given free Turkish text. For this purpose, we firstly built a comprehensive Turkish Connective Lexicon (TCL) including all types of connectives in Turkish together with their syntactic and semantic features. This lexicon is built automatically by using the discourse relation annotations in several discourse annotated corpora developed for Turkish and follows the format of the German connective lexicon, DiMLex. As in many other languages, Turkish has lexical connectives (referred to as single and phrasal connectives in this work), and it also includes suffixal connectives. We developed a rule-based Turkish Connective Disambiguator (TCD) in order to solve the usage ambiguity of single, phrasal and suffixal connective types. Then, we designed machine learning models to disambiguate the discourse usage and sense of connectives. We evaluated the TCD and the machine learning models by comparing their results with the human annotations in the Turkish section of the TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1. We observed that the machine learning approach outperforms the baseline rule-based approach although both approaches yield quite good results. Within the scope of this thesis, we developed user-friendly interfaces for the TCL and TCD programs. The TCL program lists the discourse connectives in Turkish with their features and it presents several filtering and analysis capabilities. The TCD program, on the other hand, loads the selected free Turkish text to its interface and marks the discourse and non-discourse occurrences of connectives in the text. Additionally, if the selected file has a corresponding annotation file, the program automatically evaluates the disambiguation results. This thesis makes important contributions to Turkish discourse parsing by solving the usage ambiguity of the single and phrasal connectives as well as the suffixal connectives, which, to the best of our knowledge, has been attempted for the first time in this thesis. This thesis is also the first attempt to disambiguate the sense of all types of discourse connectives in Turkish. In this respect, it is predicted that the thesis would set baselines for future Turkish connective disambiguation works and pave the road for future researchers in the Turkish discourse parsing field.

Suggestions

The Discourse structure of Turkish
Demirşahin, Işın; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2015)
This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank,...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (Springer, 2017-01-01)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼ 400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has ...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (2017-6-17)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼∼400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has of...
Automatic sense prediction of implicit discourse relations in Turkish
Kurfalı, Murathan; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2016)
In discourse parsing, the sense prediction of the Implicit discourse relations poses the most significant challenge. The thesis aims to develop a supervised system to predict the sense of implicit discourse relations in Turkish Discourse Bank (TDB). In order to accomplish that goal, the discourse level annotations obtained from TDB are used. TDB follows the PDTB-2’s sense hierarchy and for all experiments within the current study, only CLASS senses are considered. As the primary experiment, the classifiers ...
An Experimental study on abstract anaphora resolution in Turkish written discourse
Ergin Somer, Rabiye; Zeyrek Bozşahin, Deniz; Acartürk, Cengiz; Department of Cognitive Sciences (2012)
This thesis provides an experimental approach to abstract anaphora resolution in Turkish written discourse. The core of this work consists of identifying various manifestations of abstract anaphoric expressions –bu vs. bu durum, bu olay, bu iş, bu gerçek (bu as the bare abstract object anaphor vs. bu+label abstract anaphors)- in Turkish discourse, and investigating whether any difference is observed in their processing. To this end, two offline experiments are conducted with human subjects, and the results ...
Citation Formats
K. Başıbüyük, “Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.