Exploring attribution in Turkish discourse: an annotation-based analysis

2024-9-04
Yaman, Aysu Nur
Attribution involves recognizing and crediting sources, a process integral to both written and spoken discourse. This study extends existing frameworks, particularly the Penn Discourse TreeBank (PDTB), which elucidates how sources and statements are attributed in English, to Turkish texts using the Turkish Discourse Bank version 1.2 (TDB 1.2). The aim is to understand the mechanisms of attribution in Turkish and reduce dependency on manual annotation for text analysis. Employing insights from the literature, a tailored annotation scheme was developed. Data annotation achieved strong inter-annotator agreement with Cohen‘s kappa coefficients: 0.83 for Arg1, 0.80 for Arg2, and 0.77 for Entire Discourse Relation (Entire Drel), indicating near-perfect to substantial agreement. Analysis of the annotated data revealed that the Other (Ot) category dominated with 296 instances in REL, followed by Arg1 (259 instances) and Arg2 (221 instances). The majority of verbs were communicative such as de- ( ̳to say‘) 211 times, söyle- ( ̳to tell‘) 88 times, belirt- ( ̳to point out‘) 56 times, with communicative verbs comprising 75.9% of occurrences in relevant categories. In comparing journalistic and non-journalistic texts, the analysis found that journalistic genres had higher frequencies of attribution. News texts showed the highest number of attributions with 307 instances, followed by articles with 89 instances, and interviews with 27 instances. In non-journalistic texts, novels exhibited 296 attributions, followed by memoirs with 146, and research texts with 82 attributions. This analysis enriches the TDB and sets a foundation for future automated text analysis.
Citation Formats
A. N. Yaman, “Exploring attribution in Turkish discourse: an annotation-based analysis,” M.S. - Master of Science, Middle East Technical University, 2024.