Sentiment-focused web crawling

Vural, Avni Güral
The advent of Web 2.0 has led to an increase in the amount of sentimental content available in the Web. Such content is often found in social media web sites in the form of product reviews, user comments, testimonials, messages in discussion forums, status updates, and personal blogs as well as in other forms, including opinions in personal pages, news articles, and product descriptions. The analysis of sentimental content has a number of important applications, most important being web search, contextual advertisement, and recommendation. The timely discovery of sentimental content is important as most sentiments quickly lose their value if they are not immediately discovered. So far, all focused crawlers work in a topic-speci fic manner and fall short when sentimental pages are focused to be discovered. In addition, up to date, most of the research carried on sentiment analysis was focused on English language. In this thesis, we present a new perspective for focused web crawling. First, we propose a sentiment-focused web crawling framework to facilitate the quick discovery of sentimental content and evaluate it via simulations over the publicly available ClueWeb09-B web page collection. Second, we propose a framework for unsupervised sentiment analysis in Turkish and perform experiments with data from popular Turkish social media sites. Finally, we consolidate our frameworks and present a customized version of sentiment-focused web crawling framework for Turkish.


Sentiment-Focused Web Crawling
Vural, A. Gural; Cambazoglu, B. Barla; Karagöz, Pınar (2014-10-01)
Sentiments and opinions expressed in Web pages towards objects, entities, and products constitute an important portion of the textual content available in the Web. In the last decade, the analysis of such content has gained importance due to its high potential for monetization. Despite the vast interest in sentiment analysis, somewhat surprisingly, the discovery of sentimental or opinionated Web content is mostly ignored. This work aims to fill this gap and addresses the problem of quickly discovering and f...
Automatic navigation model extraction for web load testing
Kara, İsmihan Refika; Betin Can, Aysu; Department of Information Systems (2011)
Web pages serve a huge number of internet users in nearly every area. An adequate testing is needed to address the problems of web domains for more efficient and accurate services. We present an automated tool to test web applications against execution errors and the errors occured when many users connect the same server concurrently. Our tool, called NaMoX, attains the clickables of the web pages, creates a model exerting depth first search algorithm. NaMoX simulates a number of users, parses the developed...
Web application testing : a systematic literature review
Doğan, Serdar; Betin Can, Aysu; Garousi, Vahid; Department of Information Systems (2013)
Context: The Web has had a significant impact on all aspects of our society. As our society relies more and more on the Web, the dependability of web applications has become increasingly important. To make these applications more dependable, for the past decade researchers have proposed various techniques for testing web-based software applications. Our literature search for related studies retrieved 193 papers in the area of web application testing, which have appeared between 2000 and 2013. Objective: As ...
Web market analysis : static, dynamic, and content evaluation
Erdal, Feride; Arifoğlu, Ali; Department of Information Systems (2012)
Importance of web services increases as the technology improves and the need for the challenging e-commerce strategies increases. This thesis focuses on web market analysis of web sites by evaluating from the perspectives of static, dynamic and content. Firstly, web site evaluation methods and web analytic tools are introduced. Then evaluation methodology is described from three perspectives. Finally, results obtained from the evaluation of 113 web sites are presented as well as their correlations.
An Approach for automated verification of web applications using model checking and replaying the scenarios of counterexamples
Paçin, Yudum; Betin Can, Aysu; Department of Information Systems (2015)
The increase in the use of web applications in various domains, raised the importance of the methodologies for verification of web applications. We propose a framework for the verification of web applications with respect to access control, link consistency and reachability properties using model checking. In this approach, users define the properties by explanatory guidance of user interface. The execution traces that lead to a property violation is translated to a script that automates the replaying of th...
Citation Formats
A. G. Vural, “Sentiment-focused web crawling,” Ph.D. - Doctoral Program, Middle East Technical University, 2013.