Utilizing Word Embeddings for Result Diversification in Tweet Search

The performance of result diversification for tweet search suffers from the well-known vocabulary mismatch problem, as tweets are too short and usually informal. As a remedy, we propose to adopt a query and tweet expansion strategy that utilizes automatically-generated word embeddings. Our experiments using state-of-the-art diversification methods on the Tweets2013 corpus reveal encouraging results for expanding queries and/or tweets based on the word embeddings to improve the diversification performance in tweet search. We further show that the expansions based on the word embeddings may serve as useful as those based on a manually constructed knowledge base, namely, ConceptNet.


Semantic Expansion of Hashtags for Enhanced Event Detection in Twitter
Özdikiş, Özer; Karagöz, Pınar; Oğuztüzün, Mehmet Halit Seyfullah (2012-09-09)
In this work, we present an event detection method in Twitter based on clustering of hashtags and introduce an enhancement technique by using the semantic similarities between the hashtags. To this aim, we devised two methods for tweet vector generation and evaluated their effect on clustering and event detection performance in comparison to word-based vector generation methods. By analyzing the contexts of hashtags and their co-occurrence statistics with other words, we identify their paradigmatic relation...
Twitter Sentiment Analysis Experiments Using Word Embeddings on Datasets of Various Scales
Arslan, Yusuf; Kucuk, Dilek; Birtürk, Ayşe Nur (2018-06-15)
Sentiment analysis is a popular research topic in social media analysis and natural language processing. In this paper, we present the details and evaluation results of our Twitter sentiment analysis experiments which are based on word embeddings vectors such as word2vec and doc2vec, using an ANN classifier. In these experiments, we utilized two publicly available sentiment analysis datasets and four smaller datasets derived from these datasets, in addition to a publicly available trained vector model over ...
User Interest Modeling in Twitter with Named Entity Recognition
Karatay, Deniz; Karagöz, Pınar (null; 2015-05-18)
Considering wide use of Twitter as the source of information, reaching an interesting tweet for a user among a bunch of tweets is challenging. In this work we propose a Named Entity Recognition (NER) based user profile modeling for Twitter users and employ this model to generate personalized tweet recommendations. Effectiveness of the proposed method is shown through a set of experiments. Copyright © 2015 held by author(s).
Search result diversification for selective search
Küçükoglu, Emre Can; Altıngövde, İsmail Sengör; Department of Computer Engineering (2019)
Our work explores the performance of result diversification methods in the selective search scenario, where the underlying document collection is topically partitioned across several nodes and the search is conducted only at a subset of these nodes. In particular, we investigate whether diversification at each node is superior to previous approaches in the literature, i.e., diversification at the broker node applied before the resource selection or after the result merging stages. We also compare performanc...
Result Diversification for Tweet Search
Ozsoy, Makbule Gulcin; Onal, Kezban Dilek; Altıngövde, İsmail Sengör (2014-10-14)
Being one of the most popular microblogging platforms, Twitter handles more than two billion queries per day. Given the users' desire for fresh and novel content but their reluctance to submit long and descriptive queries, there is an inevitable need for generating diversified search results to cover different aspects of a query topic. In this paper, we address diversification of results in tweet search by adopting several methods from the text summarization and web search domains. We provide an exhaustive ...
Citation Formats
K. D. Onal, İ. S. Altıngövde, and P. Karagöz, “Utilizing Word Embeddings for Result Diversification in Tweet Search,” 2015, vol. 9460, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/38301.