Prediction of words in Turkish sentences by LSTM-based language modeling

Download
2021-3-29
Algan, Abdullah Can
Language comprehension is affected by predictions because it is an incremental process. Predictability has been an important aspect of studying language processing and acquisition in cognitive science. In parallel, Natural Language Processing field takes advantage of advanced technology to teach computers how to understand natural language. Our study investigates if there is an alignment between human predictability and artificial language model predictability results. This thesis solely focuses on the Turkish language. Therefore, we have built a word-level Turkish language model. Our model is based on Long Short-Term Memory (LSTM), which is a recently trending method in NLP. Alternative models are trained and evaluated with their prediction accuracy on test data. Finally, the best performing model is compared to human predictability scores gathered from the cloze-test experiment. We have shown a promising correlation and analyze the cases where the correlation is high or low.

Suggestions

Probabilistic learning of Turkish morphosemantics by latent syntax
Üstün, Ahmet; Bozşahin, Hüseyin Cem; Department of Cognitive Sciences (2017)
The language processing capability of humans is highly dependent on the transparent interface between syntax and semantics which is formalized as the grammar. Morphology also interferes with this interface, in languages having rich morphology such as Turkish. This thesis aims to discover word semantics in Turkish from the compositional morphosemantics by underlying latent syntax. A computational model has been developed to learn a morpheme lexicon in which each morpheme contains semantic information in logi...
Prediction of Protein Interactions by Structural Matching: Prediction of PPI Networks and the Effects of Mutations on PPIs that Combines Sequence and Structural Information
Tunçbağ, Nurcan; Nussinov, Ruth; Gursoy, Attila (Humana Press Inc., 2017)
Structural details of protein interactions are invaluable to the understanding of cellular processes. However, the identification of interactions at atomic resolution is a continuing challenge in the systems biology era. Although the number of structurally resolved complexes in the Protein Databank increases exponentially, the complexes only cover a small portion of the known structural interactome. In this chapter, we review the PRISM system that is a protein–protein interaction (PPI) prediction tool—its r...
Identification of Discourse Relations in Turkish Discourse Bank
Kutlu, Ferhat; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2023-1-25)
Discourse is the level of language where linguistic units are organized in a structured and coherent way. One of the major problems in the field of discourse in particular, and NLU in general is how to build better models to sense the way constitutive units of discourse stick together to form a coherent whole. The discourse would be coherent if it had meaningful connections between its parts. Discourse relations, i.e., semantic or pragmatic relations between discourse units (clauses or sentences), are one o...
Heideggerian way-making to language
Sezgi, Damla; Karademir, Aret; Department of Philosophy (2016)
The main concern of the present thesis is ‘language’ in Heidegger. Beginning with a discussion of the place of the Heideggerian thought within the context of the history of philosophy, which at that time witnessed a shift which is called ‘linguistic turn’, the question ‘What is language?’ is scrutinized to show the dilemma which arises from the fact that this question itself is in language. After, from the Heideggerian perspective, the interrogation of the whatness of language is shown to be inadequate, req...
Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems
Tatlıcıoğlu, Mehmet; Yazıcı, Adnan; Department of Computer Engineering (2013)
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decompositio...
Citation Formats
A. C. Algan, “Prediction of words in Turkish sentences by LSTM-based language modeling,” M.S. - Master of Science, Middle East Technical University, 2021.