A Context aware notification framework based on distributed focused crawling

Akyol, Mehmet Ali
The amount of data generated from sources on the Web has been increasing on a daily basis. Hence, it is time-consuming to follow all these sources to reach the latest information. The way people access information on the Web is usually pull-based, meaning that they query the Web over time to find the most recent blog posts, websites, news, and even weather reports. Accessing the right information at the right time is crucial for both people and businesses to be more productive and efficient. Moreover, the information that cannot be accessed at the right time may lose its value over time. Traditional pull-based methods for obtaining information may cause important knowledge to be overlooked or too late to be noticed. Technologies like RSS enable people to access information from websites through push-based notifications, but they cannot provide a proper solution for people who are being exposed to too much information at inappropriate times. Accordingly, a push-based context aware solution is needed for people who are in need of accessing the right information at the right time. Although some promising studies in the literature have tried to solve this problem, they appear to be insufficient to meet today's big data requirements. In this study, we propose a context aware notification framework based on distributed focused crawling, in order for people to get notifications on relevant information at the right time and location by leveraging the latest advancements in distributed computing and big data analytics technologies.


A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era
AKYOL, MEHMET ALİ; Gökalp, Mert Onuralp; Kayabay, Kerem; Eren, Pekin Erhan; Koçyiğit, Altan (2017-09-08)
The amount of data created in various sources over the Web is tremendously increasing. Trying to keep track of relevant sources is an increasingly time-consuming task. The traditional way of accessing information over the Web is pull-based. Users need to query data sources in certain time intervals where an important piece of information can be lately recognized or even missed completely. Technologies including RSS help users to get push-based notifications from websites. Discovering the relevant informatio...
Automatic navigation model extraction for web load testing
Kara, İsmihan Refika; Betin Can, Aysu; Department of Information Systems (2011)
Web pages serve a huge number of internet users in nearly every area. An adequate testing is needed to address the problems of web domains for more efficient and accurate services. We present an automated tool to test web applications against execution errors and the errors occured when many users connect the same server concurrently. Our tool, called NaMoX, attains the clickables of the web pages, creates a model exerting depth first search algorithm. NaMoX simulates a number of users, parses the developed...
A five-level static cache architecture for web search engines
Ozcan, Rifat; Altıngövde, İsmail Sengör; Barla Cambazoglu, B.; Junqueira, Flavio P.; Ulusoy, Ozgur (2012-09-01)
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by us...
A Content-Boosted Collaborative Filtering Approach for Movie Recommendation Based on Local and Global Similarity and Missing Data Prediction
Özbal, Gozde; Karaman, Hilal; Alpaslan, Ferda Nur (Oxford University Press (OUP), 2011-09-01)
Most traditional recommender systems lack accuracy in the case where data used in the recommendation process is sparse. This study addresses the sparsity problem and aims to get rid of it by means of a content-boosted collaborative filtering approach applied to a web-based movie recommendation system. The main motivation is to investigate whether further success can be obtained by combining 'local and global user similarity' and 'effective missing data prediction' approaches, which were previously introduce...
A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data
Karaman, İsmail Hakkı; Köksal, Gülser; Erişkin, Levent; Department of Industrial Engineering (2022-9-1)
In the real world, while the amount of data increases, it is not easy to find labeled data for Machine Learning projects, because of the compelling cost and effort requirements for labeling data. Also, most Machine Learning projects, especially multi-label classification problems, struggle with the data imbalance problem. In these problems, some classes, even, do not have enough data to train a classifier. In this study, an over sampling method for multi-label text classification problems is developed and s...
Citation Formats
M. A. Akyol, “A Context aware notification framework based on distributed focused crawling,” M.S. - Master of Science, Middle East Technical University, 2017.