Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter

Özdikiş, Özer
Karagöz, Pınar
Oğuztüzün, Mehmet Halit Seyfullah
In this work, we present an event detection method in Twitter based on clustering of hashtags and introduce an enhancement technique by using the semantic similarities between the hashtags. To this aim, we devised two methods for tweet vector generation and evaluated their effect on clustering and event detection performance in comparison to word-based vector generation methods. By analyzing the contexts of hashtags and their co-occurrence statistics with other words, we identify their paradigmatic relationships and similarities. We make use of this information while applying a lexico-semantic expansion on tweet contents before clustering the tweets based on their similarities. Our aim is to tolerate spelling errors and capture statements which actually refer to the same concepts. We evaluate our enhancement solution on a three-day dataset of tweets with Turkish content. In our evaluations, we observe clearer clusters, improvements in accuracy, and earlier event detection times.
VLDB 2012 Workshop on Online Social Systems WOSS, August 31, 2012


Semantic Expansion of Tweet Contents for Enhanced Event Detection in Twitter
Ozdikis, Ozer; Karagöz, Pınar; Oğuztüzün, Mehmet Halit S. (2012-08-29)
This paper aims to enhance event detection methods in a micro-blogging platform, namely Twitter. The enhancement technique we propose is based on lexico-semantic expansion of tweet contents while applying document similarity and clustering algorithms. Considering the length limitations and idiosyncratic spelling in Twitter environment, it is possible to take advantage of word similarities and to enrich texts with similar words. The semantic expansion technique we implement is based on syntagmatic and paradi...
Word Embedding Based Event Detection on Social Media
Ertugrul, Ali Mert; Velioglu, Burak; Karagöz, Pınar (2017-06-23)
Event detection from social media messages is conventionally based on clustering the message contents. The most basic approach is representing messages in terms of term vectors that are constructed through traditional natural language processing (NLP) methods and then assigning weights to terms generally based on frequency. In this study, we use neural feature extraction approach and explore the performance of event detection under the use of word embeddings. Using a corpus of a set of tweets, message terms...
Event Boundary Detection Using Audio Visual Features and Web casting Texts with Imprecise Time Information
MÜJDAT, Bayar; ALAN, Özgür; SAMET, Akpınar; ORKUNT, Sabuncu; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2010-07-21)
We propose a method to detect events and event boundaries in soccer videos by using web-casting texts and audio-visual features. The events and their inaccurate time information given in web-casting texts need to be aligned with the visual content of the video. We overcome this issue by utilizing textual, visual and audio features. Existing methods assume that the time at which the event occurs is given precisely (in seconds). However, most web-casting texts presented by popular organizations such as uefa.c...
Utilizing Word Embeddings for Result Diversification in Tweet Search
Onal, Kezban Dilek; Altıngövde, İsmail Sengör; Karagöz, Pınar (2015-12-04)
The performance of result diversification for tweet search suffers from the well-known vocabulary mismatch problem, as tweets are too short and usually informal. As a remedy, we propose to adopt a query and tweet expansion strategy that utilizes automatically-generated word embeddings. Our experiments using state-of-the-art diversification methods on the Tweets2013 corpus reveal encouraging results for expanding queries and/or tweets based on the word embeddings to improve the diversification performance in...
Clustering based personality prediction on Turkish Tweets
Tutaysalgır, Esen; Toroslu, İsmail Hakkı; Department of Computer Engineering (2019)
In this thesis, we present a framework for predicting the personality traits of users using their tweets written in Turkish. The prediction model is constructed with a clustering based approach. We show how to extract linguistic features from tweet data and to adapt TF-IDF weighting and word embeddings to the Turkish tweets. Since the model is based on linguistic features, it is language specific. The prediction model uses features applicable to Turkish language and related to writing style of Turkish Twitt...
Citation Formats
Ö. Özdikiş, P. Karagöz, and M. H. S. Oğuztüzün, “Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter,” presented at the VLDB 2012 Workshop on Online Social Systems WOSS, August 31, 2012, Istanbul, Turkey, 2012, Accessed: 00, 2021. [Online]. Available: