Prediction of words in Turkish sentences by LSTM-based language modeling

Algan, Abdullah Can
Language comprehension is affected by predictions because it is an incremental process. Predictability has been an important aspect of studying language processing and acquisition in cognitive science. In parallel, Natural Language Processing field takes advantage of advanced technology to teach computers how to understand natural language. Our study investigates if there is an alignment between human predictability and artificial language model predictability results. This thesis solely focuses on the Turkish language. Therefore, we have built a word-level Turkish language model. Our model is based on Long Short-Term Memory (LSTM), which is a recently trending method in NLP. Alternative models are trained and evaluated with their prediction accuracy on test data. Finally, the best performing model is compared to human predictability scores gathered from the cloze-test experiment. We have shown a promising correlation and analyze the cases where the correlation is high or low.
Citation Formats
A. C. Algan, “Prediction of words in Turkish sentences by LSTM-based language modeling,” M.S. - Master of Science, Middle East Technical University, 2021.