Turkish clickbait detection in social media via machine learning algorithms

2021-8-26
Genç, Şura
Clickbait strategy, mostly used in headlines and teaser messages, aims to attract people’s attention, and make them click on the link by using intriguing expressions with various text-related features. Clickbait, which has become very common especially in social media in recent years, is a major problem for the flow of information. Since the information promised in the clickbait headline is generally not included in the main text, clickbait headlines disappoint readers and is problematic for ethics of journalism. In this thesis, we constructed a Turkish dataset –ClickbaitTR– with 48,060 samples, including headlines of Turkish news sources extracted from Twitter, and made it publicly available. Various machine learning algorithms such as Artificial Neural Network (ANN), Logistic Regression (LR), Random Forest (RF), Long Short-Term Memory Network (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Ensemble Classifier (EC) were applied on the dataset for detecting the clickbait headlines. The results show that the BiLSTM has the best performance in detecting clickbait headlines with 97% accuracy followed by the LSTM, the ANN, and the Ensemble Classifier with 93% accuracy. In addition to a successful clickbait detection performance, in this thesis, linguistic and psychological analysis of clickbait sentences were presented with a focus on psychological mechanisms such as curiosity and interest. This thesis contributes to clickbait detection studies with the largest clickbait dataset and best clickbait detection performance in Turkish.
Citation Formats
Ş. Genç, “Turkish clickbait detection in social media via machine learning algorithms,” M.S. - Master of Science, Middle East Technical University, 2021.