Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Identification of Discourse Relations in Turkish Discourse Bank
Download
Ferhat_Kutlu_doktora_tez.pdf
Date
2023-1-25
Author
Kutlu, Ferhat
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
228
views
150
downloads
Cite This
Discourse is the level of language where linguistic units are organized in a structured and coherent way. One of the major problems in the field of discourse in particular, and NLU in general is how to build better models to sense the way constitutive units of discourse stick together to form a coherent whole. The discourse would be coherent if it had meaningful connections between its parts. Discourse relations, i.e., semantic or pragmatic relations between discourse units (clauses or sentences), are one of the most important aspects of discourse structure. Discourse relations can be realized explicitly (i.e. through connectives), or without them, known as implicit relations. The task that automatically reveals these aspects of texts has been known as ‘discourse parsing’, and in the last two decades, the problem has turned into how to make machines a better discourse detector. Most of the existing studies target the automatic extraction of discourse structure by detecting explicit and implicit relations and the constitutive parts of the relation (i.e., arguments). Focusing on a relatively less studied language, Turkish, this thesis is designated to reveal its discourse structure by focusing on two sub-tasks of shallow discourse parsing, namely, identification of discourse relation realization types and the sense classification of explicit and implicit relations. In this way, a better model which learns discourse structure in a supervised fashion is searched. Such models have been highly needed in the enhancement of tasks such as text summarization, dialogue systems and machine translation that need information above the clause level. Working on Turkish Discourse Bank 1.2, the thesis develops the most thorough pipeline towards shallow discourse parsing. The series of experiments starts with a classification model based on linguistic features fed into legacy machine learning algorithms and ends with fine-tuning a pre-trained language model as an encoder and classifying the encoded data with neural network-based classifiers. Expressed in terms of F1-Scores, this effort has resulted in: (i) an increase from 0.36 to 0.77 in classifying discourse relation realization types, (ii) achieved 0.82 in the classification of the Level-1 senses of explicit relations and 0.54 of implicit relations. The Level-2 Senses of discourse relations are so many that it becomes impossible to end up with a sound classification performance by training with the less number of samples available in the discourse bank. Thus, the study of Level-2 Senses is left to future works, potentially supported with bigger size of discourse bank. We further explore the effect of multilingual data aggregation on the classification of discourse relation realization type through Cross-lingual Transfer Learning experiments practiced with the advantage of the BERT multilingual base model (cased) with Turkish, Chinese and English datasets. We believe that the findings are important both in providing insights regarding the performance of modern language models in Turkish and in the low-resource scenario.
Subject Keywords
Discourse Relation
,
Classification
,
Pre-trained Language Model
,
Encoding
,
Cross-lingual Transfer Learning
URI
https://hdl.handle.net/11511/102558
Collections
Graduate School of Informatics, Thesis
Suggestions
OpenMETU
Core
The use of verbal morphology in Turkish as a third language: The case of Russian-English-Turkish trilinguals
Antonova-Unlu, Elena; Sağın Şimşek, Sultan Çiğdem (SAGE Publications, 2015-06-01)
Aims and Objectives: Several studies suggest that third language acquisition (TLA) is marked with complex patterns of language interaction. However, it is not clear yet to what extent multilinguals activate each of their background languages in TLA, as various factors may trigger the activation of one of the previously learnt languages. This study aims to contribute to the discussion by examining the use of verbal morphology in third language (L3) Turkish of Russian-English-Turkish trilinguals. We investiga...
Usage disambiguation of Turkish discourse connectives
Başıbüyük, Kezban; Zeyrek Bozşahin, Deniz (2023-01-01)
This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that largely correspond to subordinating conjunctions in English. Since these connectives have different linguistic characteristics, two sets of linguistic rules are devised to disambiguate their discourse usage. The linguistic rules are used in the rule-based approach and e...
The interpretation of syntactically unconstrained anaphors in Turkish heritage speakers
Gracanın Yüksek, Martına; Şafak, Duygu Fatma; Demir, Orhan; Kırkıcı, Bilal (2019-01-01)
Previous work has shown that heritage grammars are often simplified compared to their monolingual counterparts, especially in domains in which the societally-dominant language makes fewer distinctions than the heritage language. We investigated whether linguistic simplification extended to the anaphoric system of Turkish heritage speakers living in Germany. Whereas the Turkish monolingual grammar features a three-way distinction between reflexives (kendi), pronouns (o), and syntactically-unconstrained anaph...
The Interaction of Contextual and Syntactic Information in the Processing of Turkish Anaphors
Gracanın Yüksek, Martına; Safak, Duygu Fatma; Demir, Orhan; Kırkıcı, Bilal (2017-12-01)
In contrast with languages where anaphors can be classified into pronouns and reflexives, Turkish has a tripartite system that consists of the anaphors o, kendi, and kendisi. The syntactic literature on these anaphors has proposed that whereas o behaves like a pronoun and kendi behaves like a reflexive, kendisi has a more flexible behavior and it can function as both a pronoun and a reflexive. Using acceptability judgments and a self-paced reading task, we examined how Turkish anaphors are processed in isol...
A tune-based account of Turkish information structure
Özge, Umut; Bozşahin, Hüseyin Cem; Zeyrek, Deniz; Department of Cognitive Sciences (2003)
Languages differ in the means they avail themselves of for the structural realization of information structure, where available options are word order,prosody and morphology. Turkish has long been characterized as predominantly using word order and its variation in realizing information structure, where certain positions in a sentence are associated with certain pragmatic functions related to information structure. Prosody has been proposed to play only a secondary role interacting with word order. Contrar...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
F. Kutlu, “Identification of Discourse Relations in Turkish Discourse Bank,” Ph.D. - Doctoral Program, Middle East Technical University, 2023.