Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon
Download
Thesis Final.pdf
Date
2021-8-31
Author
Başıbüyük, Kezban
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
334
views
312
downloads
Cite This
In this thesis, we developed methods for disambiguating the discourse usage and sense of connectives in a given free Turkish text. For this purpose, we firstly built a comprehensive Turkish Connective Lexicon (TCL) including all types of connectives in Turkish together with their syntactic and semantic features. This lexicon is built automatically by using the discourse relation annotations in several discourse annotated corpora developed for Turkish and follows the format of the German connective lexicon, DiMLex. As in many other languages, Turkish has lexical connectives (referred to as single and phrasal connectives in this work), and it also includes suffixal connectives. We developed a rule-based Turkish Connective Disambiguator (TCD) in order to solve the usage ambiguity of single, phrasal and suffixal connective types. Then, we designed machine learning models to disambiguate the discourse usage and sense of connectives. We evaluated the TCD and the machine learning models by comparing their results with the human annotations in the Turkish section of the TED-Multilingual Discourse Bank and Turkish Discourse Bank 1.1. We observed that the machine learning approach outperforms the baseline rule-based approach although both approaches yield quite good results. Within the scope of this thesis, we developed user-friendly interfaces for the TCL and TCD programs. The TCL program lists the discourse connectives in Turkish with their features and it presents several filtering and analysis capabilities. The TCD program, on the other hand, loads the selected free Turkish text to its interface and marks the discourse and non-discourse occurrences of connectives in the text. Additionally, if the selected file has a corresponding annotation file, the program automatically evaluates the disambiguation results. This thesis makes important contributions to Turkish discourse parsing by solving the usage ambiguity of the single and phrasal connectives as well as the suffixal connectives, which, to the best of our knowledge, has been attempted for the first time in this thesis. This thesis is also the first attempt to disambiguate the sense of all types of discourse connectives in Turkish. In this respect, it is predicted that the thesis would set baselines for future Turkish connective disambiguation works and pave the road for future researchers in the Turkish discourse parsing field.
Subject Keywords
Discourse processing
,
Discourse relations
,
Discourse connective disambiguation
,
Discourse connectives
,
Discourse connective lexicon
,
Turkish
URI
https://hdl.handle.net/11511/93037
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Usage disambiguation of Turkish discourse connectives
Başıbüyük, Kezban; Zeyrek Bozşahin, Deniz (2023-01-01)
This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that largely correspond to subordinating conjunctions in English. Since these connectives have different linguistic characteristics, two sets of linguistic rules are devised to disambiguate their discourse usage. The linguistic rules are used in the rule-based approach and e...
The Discourse structure of Turkish
Demirşahin, Işın; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2015)
This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank,...
Linking discourse-level information and the induction of bilingual discourse connective lexicons
Özer, Sibel; Kurfall, Murathan; Zeyrek Bozşahin, Deniz; Mendes, Amália; Oleškevičiene, Giedre Valunaite (2022-6-20)
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide a more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consi...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (Springer, 2017-01-01)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼ 400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has ...
Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank
Demirşahin, Işın; Zeyrek Bozşahin, Deniz (2017-6-17)
In this chapter, we provide an overview of Turkish Discourse Bank, a resource of ∼∼400,000 words built on a sub-corpus of the 2-million-word METU Turkish Corpus annotated following the principles of Penn Discourse Tree Bank. We first present the annotation framework we adopted, explaining how it differs from the annotation of the original language, English. Then we focus on a novel annotation procedure that we have devised and named pair annotation after pair programming. We discuss the advantages it has of...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
K. Başıbüyük, “Automatic disambiguation of Turkish discourse connectives based on a Turkish connective lexicon,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.