Investigating the performance of segmentation methods with deep learning models for sentiment analysis on turkish informal texts

Download
2018
Kurt, Fatih
This work investigates segmentation approaches for informal short texts in morphologically rich languages in order to e ectively classify the sentiment. The two building blocks of the proposed work in this thesis are segmentation and deep neural network model building. Segmentation focuses on preprocessing of text with di erent methodologies. These methodologies are grouped under four distinct approaches; namely, morphological, sub-word, tokenization, and hybrid approaches. There is mostly multiple numbers of variants for each of these four methods provided in this work. The second stage focuses on e ective model building for classifying text. Performances of each method are evaluated by utilizing a model built by a Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) model proposed in the literature for text classi cation.

Suggestions

Frequency-driven late fusion-based word decomposition approach on the phrase-based statistical machine translation systems
Tatlıcıoğlu, Mehmet; Yazıcı, Adnan; Department of Computer Engineering (2013)
Machine translation is the process of translating texts from a natural language to another by computers based on linguistic motivations, statistical approaches, or the combination of them. In this study, the frequency-driven late fusion-based word decomposition approach is introduced to improve the translation quality of the phrase-based statistical machine translation system from Turkish to English. This late fusion-based approach is compared with the standalone statistical and rule-based word decompositio...
An investigation of incidental vocabulary acquisition in relation to learner proficiency level and word frequency
Tekmen, E. Anne Ferrell; Daloğlu, Ayşegül (Wiley, 2006-06-01)
This study examined the relationship between learners' incidental vocabulary acquisition and their level of proficiency, and between acquisition and word frequency in a text. Participants were Turkish learners of English at three proficiency levels. One reading text and four vocabulary tests were administered over a two-week period. Analyses of the data revealed that lexical gains from reading were significant for each group (p < .05). The higher proficiency groups were able to acquire more words than lower...
An investigation of interactions with conversational violations: Insights from visual perception and Gricean Maxim violations
Çağıltay, Bengisu; Acartürk, Cengiz; Department of Cognitive Sciences (2020-11)
Linguistic principles at various levels are crucial in maintaining a reliable and transparent communication for dyadic interactions. However, violating these principles might result in unwieldy and problematic communications. Gaze can be a medium of reflecting the cognitive responses when conversational violations occur. An eye-tracking study was conducted to investigate visual patterns in communication in response to social communication errors, specifically Grice’s Maxims violations. This study investigat...
Head finalization and morphological analysis in factored phrase-based statistical machine translation from English to Turkish
İmren, Haydar; Çakıcı, Ruket; Department of Computer Engineering (2015)
Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of par...
A study on language modeling for Turkish large vocabulary continuous speech recognition
Bayer, Ali Orkan; Turhan Yöndem, Meltem; Department of Computer Engineering (2005)
This study focuses on large vocabulary Turkish continuous speech recognition. Continuous speech recognition for Turkish cannot be performed accurately because of the agglutinative nature of the language. The agglutinative nature decreases the performance of the classical language models that are used in the area. In this thesis firstly, acoustic models using different parameters are constructed and tested. Then, three types of n-gram language models are built. These involve class-based models, stem-based mo...
Citation Formats
F. Kurt, “Investigating the performance of segmentation methods with deep learning models for sentiment analysis on turkish informal texts,” M.S. - Master of Science, Middle East Technical University, 2018.