Sentiment-Focused Web Crawling

2014-10-01
Vural, A. Gural
Cambazoglu, B. Barla
Karagöz, Pınar
Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and fetching the sentimental content present in the Web. To this end, we design a sentiment-focused Web crawling framework. In particular, we propose different sentiment-focused Web crawling strategies that prioritize discovered URLs based on their predicted sentiment scores. Through simulations, these strategies are shown to achieve considerable performance improvement over general-purpose Web crawling strategies in discovery of sentimental Web content.
ACM TRANSACTIONS ON THE WEB

Suggestions

Sentiment-focused web crawling
Vural, Avni Güral; Karagöz, Pınar; Cambazoğlu, Berkant Barla; Department of Computer Engineering (2013)
The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of product reviews, user comments, testimonials, messages in discussion forums, status updates, and personal blogs as well as in other forms, including opinions in personal pages, news articles, and product descriptions. The analysis of sentimental content has a number of important applications, most important being web search, contextual a...
WaPUPS: Web access pattern extraction under user-defined pattern scoring
Alkan, Oznur Kirmemis; Karagöz, Pınar (2016-04-01)
Extracting patterns from web usage data helps to facilitate better web personalization and web structure readjustment. The classical frequency-based sequence mining techniques consider only the binary occurrences of web pages in sessions that result in the extraction of many patterns that are not informative for users. To handle this problem, utility-based mining technique has emerged, which assigns non-binary values, called utilities, to web pages and calculates pattern utilities accordingly. However, the ...
Analyzing Implicit Aspects and Aspect Dependent Sentiment Polarity for Aspect-based Sentiment Analysis on Informal Turkish Texts
Kama, Batuhan; ÖZTÜRK, MURAT; Karagöz, Pınar; Toroslu, İsmail Hakkı; Kalender, Murat (2017-11-09)
The web provides a suitable media for users to post comments on different topics. In most of such content, authors express different opinions on different features or aspects of the topic. In aspect based sentiment analysis, it is analyzed as to for which aspect which opinion is expressed. Once aspects are available, the next important step is to match aspects with correct sentiments. In this work, we investigate enhancements for two cases in matching step: extracting implicit aspects, and sentiment words w...
Detecting User Emotions in Twitter through Collective Classification
İLERİ, İBRAHİM; Karagöz, Pınar (2016-11-11)
The explosion in the use of social networks has generated a big amount of data including user opinions about varying subjects. For classifying the sentiment of user postings, many text-based techniques have been proposed in the literature. As a continuation of sentiment analysis, there are also studies on the emotion analysis. Due to the fact that many different emotions are needed to be dealt with at this point, the problem gets more complicated as the number of emotions to be detected increases. In this s...
A semantic backend for content management systems
LALECİ ERTÜRKMEN, GÖKÇE BANU; Aluc, G.; Dogac, A.; SINACI, ALİ ANIL; Kılıç, Özgün Ozan; Tuncer, F. (2010-12-01)
The users of a content repository express the semantics they have in mind while defining the content items and their properties, and forming them into a particular hierarchy. However, this valuable semantics is not formally expressed, and hence cannot be used to discover meaningful relationships among the content items in an automated way. Although the need is apparent, there are several challenges in explicating this semantics in a fully automated way: first, it is difficult to distinguish between data and...
Citation Formats
A. G. Vural, B. B. Cambazoglu, and P. Karagöz, “Sentiment-Focused Web Crawling,” ACM TRANSACTIONS ON THE WEB, pp. 0–0, 2014, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/33286.