Spam detection by using network and text embedding approaches

Download
2019
Yılmaz, Cennet Merve
Authenticity and reliability of the information spread over the cyberspace is becoming increasingly important, especially in e-commerce. This is because potential customers check reviews and customer feedbacks online before making a purchasing decision. Although this information is easily accessible through related websites, lack of verification of the authenticity of these reviews raises concerns about their reliability. Besides, fraudulent users disseminate disinformation to deceive people into acting against their interest. So, detection of fake and unreliable reviews is a crucial problem that must be addressed. In this study, we analyze and compare three different spam review detection approaches, DocRep, NodeRep and SPR2EP, that utilize review text only, network information only and the one that is proposed in this study that incorporates knowledge extracted from the textual content of the reviews with information obtained by exploiting the underlying reviewer-product network structure, respectively. One of the important contributions of this study is the proposed framework, SPR2EP, is that it benefits from both review text and network information. In SPR2EP approach, first, feature vectors are learned for each review, reviewer and product by utilizing state-of-the-art algorithms developed for learning document and node embeddings, and then these are fed into a classifier to identify opinion spam. It minimizes the feature engineering effort. The effectiveness of our framework approaches that utilize network embeddings over existing techniques on detecting spam reviews is demonstrated in three different data sets containing online reviews.

Suggestions

SPR2EP: A Semi-Supervised Spam Review Detection Framework
Yılmaz, Cengiz (2018-08-31)
Authenticity and reliability of the information spread over the cyberspace is becoming increasingly important. This is especially important in e-commerce since potential customers check reviews and customer feedbacks online before making a purchasing decision. Although this information is easily accessible through related websites, lack of verification of the authenticity of these reviews raises concerns about their reliability. Besides, fraudulent users disseminate misinformation to deceive people into act...
Exploiting word and sentence embeddings for diversification in crawling and ranking
Ünaldı, Can Duran; Altıngövde, İsmail Sengör; Department of Computer Engineering (2022-9)
The increase in the volume of the Web and Microblogging sites caused copious amounts of duplicate or near duplicate content which emerged the diversification paradigm. On a typical search system, there are three main components, namely, a crawler, an indexer and a query processor. While most diversification approaches aim at the query processing stage of the search system, in this work, we aim to apply the diversification paradigm to both crawling and query processing stages. First, we introduce a diversifi...
Web market analysis : static, dynamic, and content evaluation
Erdal, Feride; Arifoğlu, Ali; Department of Information Systems (2012)
Importance of web services increases as the technology improves and the need for the challenging e-commerce strategies increases. This thesis focuses on web market analysis of web sites by evaluating from the perspectives of static, dynamic and content. Firstly, web site evaluation methods and web analytic tools are introduced. Then evaluation methodology is described from three perspectives. Finally, results obtained from the evaluation of 113 web sites are presented as well as their correlations.
Internet users' attitudes toward business-to-consumer online shopping: A survey
Huseynov, Farid; Özkan Yıldırım, Sevgi (2016-06-01)
Rapid growth of online shopping activities in recent years has required careful identifications of key factors influencing consumers' behaviors and attitudes toward online shopping. Identifying critical factors influencing online consumer behavior is very crucial for effective customer relationship management. It is very important that online sellers clearly understand critical factors influencing online customers' shopping intention and take necessary actions accordingly. If identified and managed properly...
Multilingual dynamic linking of web resources
Dönmez, Uğur; Coşar, Ahmet; Yeşilada, Yeliz; Department of Computer Engineering (2014)
The World Wide Web is successful for locating, browsing and publishing information by its scalable architecture. However, the Web suffers from some limitations. For example, links on the Web are embedded in documents. Links are only unidirectional, ownership is required to place an anchor in documents, and authoring links is an expensive process. The embedded link structure of the Web can be improved by Semantic Web. By using Semantic Web components, existing Web resources can be enriched with additional ex...
Citation Formats
C. M. Yılmaz, “Spam detection by using network and text embedding approaches,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Industrial Engineering., Middle East Technical University, 2019.