Automatic Usage Disambiguation of the Enclitic dA in Turkish

Ersöyleyen, Elif Ebru
Discourse is composed of several constituents that yield coherency in a structural form. One of the interesting aspects of discourse is discourse connectives and their contribution to discourse structure. They are lexico-syntactic elements that signal a semantic relation between two discourse units (clauses and sentences). Clitics are morphemes that are phonologically dependent on the lexical item to which they are attached, but have separate syntactic forms, and carry no meaning by themselves. They can function as a discourse connective in several languages; for example in Cuzco Quechua the clitic pas and in Turkish, dA can signal multiple senses, and have features that distinguish them from affixes and other words. dA is essentially a focus-associated enclitic that also has discourse functions in Turkish, conveying contrast, addition, causal and condition senses. In other words, just like other linguistic expressions, dA is subject to ambiguity and creates a challenge in natural language automatization tasks. The aim of this study is two-fold: (a) to analyze the linguistic behavior of dA, annotating its discourse and non-discourse occurrences in corpora of written Turkish, (b) to develop machine learning models that distinguish its discourse usage from its non-discourse usage - i.e., its discourse connective vs. focus enclitic role. The thesis describes the annotation study and the machine learning models, which uses linguistic features. The results of our machine learning experiments show that we can disambiguate the discourse usage of dA with an F1-score of 0.83 in free texts.


