A novel pre-processing workflow for popularity prediction in social media

Download

huseyinbugra_yildirim_thesis.pdf

Date

2021-9-10

Author

Yıldırım, Hüseyin Buğra

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

698
views

321
downloads

Users in Twitter are in continuous interaction with each other through posts and reactions such as likes and retweets. Tweets often get a little reaction from people, with only a few of them receiving a prominent response. Thus, reaction numbers result in having a heavy right-skewed distribution. Furthermore, some tweets show unexpected response performance that cannot be depicted by standard features and are often dependent on extraordinary situations such as being the first reporter and mass reaction. Heavily skewed distribution of social media dataset and variation between expected and the observed reactions are mainly two distorting factors for model prediction. This thesis initially addresses the concept of outliers and uncertainty in reaction numbers in social media datasets. A method for identifying social media outliers is proposed, and the adverse effects of outliers on modeling are presented. Finally, a SMOTE-based data augmentation method, where a discretization is applied and synthetic data is generated predominantly from the clusters with fewer instances, is presented. The results show that the models where outlier removal and data augmentation are applied achieve slightly better prediction performance than those constructed without them. This research presents practical implications for studies that aim to predict the popularity of tweets.

Subject Keywords

Social media, Popularity prediction, Pre-processing, Outlier detection, Data augmentation

URI

https://hdl.handle.net/11511/93058

Collections

Graduate School of Informatics, Thesis

Suggestions

OpenMETU
Core

Developing a Twitter bot that can join a discussion using state-of-the-art architectures Çetinkaya, Yusuf Mucahit; Toroslu, İsmail Hakkı (Springer Science and Business Media LLC, 2020-07-01) Today, microblogging platforms like Twitter have become popular by spreading news and opinions that gather attention. Engaging interactions, such as likes, shares, and replies, between users are the key determinants of these platforms' news feed prioritization algorithms. These interactions attract people to ongoing debates and help inform and shape their opinions. Since being influential and attracting followers in these debates are considered as important, understanding the automation of these processes b...
Sosyal Medyada 2011 Genel Seçimleri Nitel Nicel Arayüzey İncelemesi Bayraktutan, Günseli; Binark, Ferruh Mutlu; Çomu, Tuğrul; Doğu, Burak; İslamoğlu, Gözde; Telli Aydemir, Aslı (Selçuk Üniversitesi İletişim Fakültesi, 2012-07-01) Bu çalışma “Sosyal Medya Ortamlarının Siyasal İletişim Uygulamaları Açısından İncelenmesi: Türkiye’de 2011 Genel Seçimlerinde Facebook ve Twitter’ın Siyasi Parti ve Liderler Tarafından Kullanılması” adlı TÜBİTAK destekli (Kasım 2011-Kasım 2012) araştırma projesi kapsamında kullanılan nicel ve nitel arayüzey incelemesi üzerine metodolojik bir değerlendirme içermektedir. Araştırma projesinde Web 2.0’ın demokratik katılım kaynaklı yurttaşlık kültürünün gelişmesine katkıda bulunduğu varsayımdan hareketle, sosya...
Combining topology-based & content-based analysis for followee recommendation on Twitter Yanar, Aysu; Karagöz, Pınar; Taşkaya Temizel, Tuğba; Department of Information Systems (2015) Twitter has become an important social platform for individuals and people share a high number of information about their personal lives, interests and viral news during emergencies. As of 2014, Twitter has 240 million active users and approximately 500 million tweets are shared every day. This information overload in Twitter has become a serious problem due to the growing volume of messages and increasing number of users. Recommender systems help to overcome this challenge. Finding interesting users and ge...
Developing a twitter bot that can join a discussion using state-of-the-art architectures Çetinkaya, Yusuf Mücahit; Toroslu, İsmail Hakkı; Department of Computer Engineering (2019) Twitter is today mostly used for sharing and commenting about news. In this manner, the interaction between Twitter users is inevitable. This interaction sometimes causes people to move daily debates to this social platform. Since being dominant in these debates is crucial, automation of this process becomes highly popular. In this work, we aim to train a bot that classifies posted tweets according to their semantic and generates logical tweets about a popular discussion, namely gun debate of the U.S. for t...
Determining user types from twitter account contentand structure Gürlek, Mesut; Toroslu, İsmail Hakkı; Department of Computer Engineering (2021-3-05) People are using social media platforms more and more every day; hence, they are be-coming suitable for research studies by their rich content. Twitter is one of the biggestand most widely used social media platforms, and many studies focus on Twitter forsocial media research. In this thesis, we propose methodologies for determining usertypes of Twitter accounts by their metadata, content, and structure. Our first problemis classifying organization vs. individual account types using only metadata. After weg...

Citation Formats

H. B. Yıldırım, “A novel pre-processing workflow for popularity prediction in social media,” M.S. - Master of Science, Middle East Technical University, 2021.