A novel pre-processing workflow for popularity prediction in social media

2021-9-10
Yıldırım, Hüseyin Buğra
Users in Twitter are in continuous interaction with each other through posts and reactions such as likes and retweets. Tweets often get a little reaction from people, with only a few of them receiving a prominent response. Thus, reaction numbers result in having a heavy right-skewed distribution. Furthermore, some tweets show unexpected response performance that cannot be depicted by standard features and are often dependent on extraordinary situations such as being the first reporter and mass reaction. Heavily skewed distribution of social media dataset and variation between expected and the observed reactions are mainly two distorting factors for model prediction. This thesis initially addresses the concept of outliers and uncertainty in reaction numbers in social media datasets. A method for identifying social media outliers is proposed, and the adverse effects of outliers on modeling are presented. Finally, a SMOTE-based data augmentation method, where a discretization is applied and synthetic data is generated predominantly from the clusters with fewer instances, is presented. The results show that the models where outlier removal and data augmentation are applied achieve slightly better prediction performance than those constructed without them. This research presents practical implications for studies that aim to predict the popularity of tweets.

Suggestions

Developing a Twitter bot that can join a discussion using state-of-the-art architectures
Çetinkaya, Yusuf Mucahit; Toroslu, İsmail Hakkı (Springer Science and Business Media LLC, 2020-07-01)
Today, microblogging platforms like Twitter have become popular by spreading news and opinions that gather attention. Engaging interactions, such as likes, shares, and replies, between users are the key determinants of these platforms' news feed prioritization algorithms. These interactions attract people to ongoing debates and help inform and shape their opinions. Since being influential and attracting followers in these debates are considered as important, understanding the automation of these processes b...
Sosyal Medyada 2011 Genel Seçimleri Nitel Nicel Arayüzey İncelemesi
Bayraktutan, Günseli; Binark, Ferruh Mutlu; Çomu, Tuğrul; Doğu, Burak; İslamoğlu, Gözde; Telli Aydemir, Aslı (Selçuk Üniversitesi İletişim Fakültesi, 2012-07-01)
Bu çalışma “Sosyal Medya Ortamlarının Siyasal İletişim Uygulamaları Açısından İncelenmesi: Türkiye’de 2011 Genel Seçimlerinde Facebook ve Twitter’ın Siyasi Parti ve Liderler Tarafından Kullanılması” adlı TÜBİTAK destekli (Kasım 2011-Kasım 2012) araştırma projesi kapsamında kullanılan nicel ve nitel arayüzey incelemesi üzerine metodolojik bir değerlendirme içermektedir. Araştırma projesinde Web 2.0’ın demokratik katılım kaynaklı yurttaşlık kültürünün gelişmesine katkıda bulunduğu varsayımdan hareketle, sosya...
Developing a twitter bot that can join a discussion using state-of-the-art architectures
Çetinkaya, Yusuf Mücahit; Toroslu, İsmail Hakkı; Department of Computer Engineering (2019)
Twitter is today mostly used for sharing and commenting about news. In this manner, the interaction between Twitter users is inevitable. This interaction sometimes causes people to move daily debates to this social platform. Since being dominant in these debates is crucial, automation of this process becomes highly popular. In this work, we aim to train a bot that classifies posted tweets according to their semantic and generates logical tweets about a popular discussion, namely gun debate of the U.S. for t...
Determining user types from twitter account contentand structure
Gürlek, Mesut; Toroslu, İsmail Hakkı; Department of Computer Engineering (2021-3-05)
People are using social media platforms more and more every day; hence, they are be-coming suitable for research studies by their rich content. Twitter is one of the biggestand most widely used social media platforms, and many studies focus on Twitter forsocial media research. In this thesis, we propose methodologies for determining usertypes of Twitter accounts by their metadata, content, and structure. Our first problemis classifying organization vs. individual account types using only metadata. After weg...
Combining topology-based & content-based analysis for followee recommendation on Twitter
Yanar, Aysu; Karagöz, Pınar; Taşkaya Temizel, Tuğba; Department of Information Systems (2015)
Twitter has become an important social platform for individuals and people share a high number of information about their personal lives, interests and viral news during emergencies. As of 2014, Twitter has 240 million active users and approximately 500 million tweets are shared every day. This information overload in Twitter has become a serious problem due to the growing volume of messages and increasing number of users. Recommender systems help to overcome this challenge. Finding interesting users and ge...
Citation Formats
H. B. Yıldırım, “A novel pre-processing workflow for popularity prediction in social media,” M.S. - Master of Science, Middle East Technical University, 2021.