Power of frequencies : n-grams and semi-supervised morphological segmentation in Turkish

Download
2013
Kılıç, Özkan
Turkish is an agglutinating language with a non-rigid word order. When communicating, the word internal structure in Turkish is required to be segmented because Turkish morphosyntax is tortuous and it plays a central role in semantic analysis. Distinguishing a sub-word unit actually means performing a morph segmentation task, which is accomplished by children at an astonishing success rate. In this study, morph segmentation of Turkish words was demonstrated with a semi-supervised Hidden Markov Model, which emphasized the power of frequencies and sequences as direct (or indirect negative) evidence for language acquisition. The method achieved .88, .92 and .90 (precision, recall and f-score) measures after being trained by the METU Corpus and the METU-Sabancı Turkish Treebank. Additionally, statistical approaches were offered for compound word recognition and segmentation. In order to corroborate the use of frequencies in the cognitive studies, the experimental studies and the corresponding statistical models in Turkish emphatic reduplication and the acceptability of nonce words were also proposed in this study. This study shows that since the probability mass in child-directed speech is skewed toward possible word forms and unlikely morph sequences, this mass can be used by various models to mimic human-level linguistic capabilities. Furthermore, human beings have a statistical learning ability and it is not specific to the faculty of language as claimed by nativists but to general cognition. This allows the plausible and valid use of computational and statistical models to analyze language. Such predictive models can allow a deeper understanding of language.

Suggestions

Grammar and information : a study of Turkish indefinites
Özge, Umut; Bozşahin, Hüseyin Cem; Department of Cognitive Sciences (2010)
Turkish, along with many other languages, marks its direct objects in two distinct ways: overt accusative marking (Acc) versus no marking (∅). The research on the grammar and interpretation of Turkish indefinite descriptions has focused on the effects of this distinc- tion in case-marking on the interpretation of indefinite noun phrases. The overt accusative marker has been associated with discourse-linking (Nilsson 1985; Enç 1991; Zidani-Eroğlu 1997), specificity (von Heusinger 2002; von Heusinger and Kornfilt...
A note on the contact between Kurmanji Kurdish and Turkish at lexical and morphological level
Çabuk Ballı, Sakine (SAGE Publications, 2019-08-01)
Turkish-Kurdish social setting where the Turkish and Kurdish languages are in contact for a long time induces borrowing and change at different levels.This study explores the contact between Kurmanji Kurdish and Turkish that take place at both morphological and lexical level. The data consist of three hours of recordings of family talks on the phone. Corpus analysis of data obtained from audio and video recordings of a family talk on the phone was done. Preliminary findings revealed that verbs are borrowed ...
Syntactic priming of relative clause attachment in monolingual Turkish speakers and Turkish learners of English
Başer, Zeynep; Hohenberger, Annette Edeltraud; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2018)
The purpose of this study is to investigate the syntactic priming of relative clause attachment in monolingual Turkish speakers and Turkish learners of English with different levels of proficiency in English. Turkish and English belong to typologically different groups of languages. Within the scope of this study, we investigate syntactic priming of relative clause attachments, which enables us to examine and compare the strategies employed for ambiguity resolution both in Turkish and English. The data was ...
'Face' across historical cultures A comparative study of Turkish and Chinese
Ruhi, Sukriye; Kadar, Daniel Z. (John Benjamins Publishing Company, 2011-01-01)
This paper investigates the use of the word 'face' in late nineteenth- and early twentieth-century Turkish and Chinese so as to trace the meaning of the concept in the two languages and cultures. The study describes the occurrence of the lexeme in five semantic/pragmatic domains in novels dating from the turn of the twentieth century, a period that corresponds to an acceleration in modernisation movements. Two conclusions are drawn from the comparison of face in Turkish and Chinese, and noteworthy similarit...
Head finalization and morphological analysis in factored phrase-based statistical machine translation from English to Turkish
İmren, Haydar; Çakıcı, Ruket; Department of Computer Engineering (2015)
Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of par...
Citation Formats
Ö. Kılıç, “Power of frequencies : n-grams and semi-supervised morphological segmentation in Turkish,” Ph.D. - Doctoral Program, Middle East Technical University, 2013.