Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
REFERENCE SELECTION IN TURKISH: A CORPUS-BASED APPROACH
Download
thesis-faruk.pdf
Faruk Büyüktekin_Tez Teslim Belgeleri.pdf
Date
2025-6-18
Author
Büyüktekin, Faruk
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
8129
views
0
downloads
Cite This
This thesis investigates reference selection in natural language, focusing on the mech- anisms that shape the form of referring expressions. Drawing from both linguis- tic theory and data-driven computational methods, the study seeks to uncover how grammatical, discourse, and cognitive factors jointly influence referential form. As the target language, Turkish offers a testing ground to explore referential choices be- yond the patterns observed in well-studied languages due to its typologically distinct characteristics specifically, its rich morphology, frequent use of null pronouns, and flexible word order. A central contribution of this work is the creation of a novel coreference corpus based on spontaneous, goal-directed dialog. Unlike prior research that has typically relied on semi-artificial or isolated sentences, or written texts, this study uses situated task-based interaction, capturing reference in real-time naturalistic speech. To facilitate this, a new annotation scheme was developed to represent the full range of referential forms, including full noun phrases, overt pronouns, and null pro- nouns, and their contextual and grammatical properties. The resulting corpus, which is the most comprehensive coreference corpus of Turkish dialogs to date, enables sys- tematic and computationally viable analyses of referential phenomena. Building on this resource, the thesis conducts extensive statistical analyses and employs machine learning to evaluate the effects and interactions of multiple features on referential form. These include speaker role, turn-taking, grammatical role, competition, dis- tance, topicality, and sentential position. Among the findings, competition and dis- tance emerged as the most predictive features in model performance, while speaker role and turn-taking showed weaker but interpretable effects. Statistical tests con- firmed that many of these factors significantly influence form choice, supporting and extending theoretical predictions of major theories and models related to referential form selection. By integrating corpus development, feature engineering, statistical modeling, and explanatory machine learning, this thesis offers a unified framework for analyzing reference in Turkish. It not only contributes to theoretical accounts of referential choice in typologically diverse settings but also provides scalable tools for future research in natural language processing, cognitive modeling, and dialog systems.
Subject Keywords
corpus
,
dialog
,
coreference
,
referring expression
,
referential form
URI
https://hdl.handle.net/11511/115165
Collections
Graduate School of Informatics, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
F. Büyüktekin, “REFERENCE SELECTION IN TURKISH: A CORPUS-BASED APPROACH,” Ph.D. - Doctoral Program, Middle East Technical University, 2025.