A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era

The amount of data created in various sources over the Web is tremendously increasing. Trying to keep track of relevant sources is an increasingly time-consuming task. The traditional way of accessing information over the Web is pull-based. Users need to query data sources in certain time intervals where an important piece of information can be lately recognized or even missed completely. Technologies including RSS help users to get push-based notifications from websites. Discovering the relevant information without a notification overload is still not possible with existing technologies. Despite some promising efforts in push-based architectures to solve this problem, they fall short to meet the requirements in the big data era. In this study, by leveraging the latest advancements in distributed computing and big data analytics technologies, we use a focused crawling approach to propose a context aware notification architecture for people to find desired information at its most valuable state.


A Context aware notification framework based on distributed focused crawling
Akyol, Mehmet Ali; Eren, Pekin Erhan; Department of Information Systems (2017)
The amount of data generated from sources on the Web has been increasing on a daily basis. Hence, it is time-consuming to follow all these sources to reach the latest information. The way people access information on the Web is usually pull-based, meaning that they query the Web over time to find the most recent blog posts, websites, news, and even weather reports. Accessing the right information at the right time is crucial for both people and businesses to be more productive and efficient. Moreover, the i...
A visual programming framework for distributed Internet of Things centric complex event processing
Gökalp, Mert Onuralp; Koçyiğit, Altan; Eren, Pekin Erhan (2019-03-01)
Complex Event Processing (CEP) is a promising approach for real-time processing of big data streams originating from Internet of Things (IoT) devices. Even though scalability and flexibility are key issues for IoT applications, current studies are mostly based on centralized solutions and restrictive query languages. Moreover, development, deployment and operation of big-data applications require significant amount of technical expertise. Hence, a framework that provides a higher abstraction level programmi...
A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data
Karaman, İsmail Hakkı; Köksal, Gülser; Erişkin, Levent; Department of Industrial Engineering (2022-9-1)
In the real world, while the amount of data increases, it is not easy to find labeled data for Machine Learning projects, because of the compelling cost and effort requirements for labeling data. Also, most Machine Learning projects, especially multi-label classification problems, struggle with the data imbalance problem. In these problems, some classes, even, do not have enough data to train a classifier. In this study, an over sampling method for multi-label text classification problems is developed and s...
A content boosted collaborative filtering approach for recommender systems based on multi level and bidirectional trust data
Şahinkaya, Ferhat; Alpaslan, Ferda Nur; Department of Computer Engineering (2010)
As the Internet became widespread all over the world, people started to share great amount of data on the web and almost every people joined different data networks in order to have a quick access to data shared among people and survive against the information overload on the web. Recommender systems are created to provide users more personalized information services and to make data available for people without an extra effort. Most of these systems aim to get or learn user preferences, explicitly or impli...
New Techniques in Profiling Big Datasets for Machine Learning with a Concise Review of Android Mobile Malware Datasets
CANBEK, Gurol; SAĞIROĞLU, ŞEREF; Taşkaya Temizel, Tuğba (2018-12-04)
As the volume, variety, velocity aspects of big data are increasing, the other aspects such as veracity, value, variability, and venue could not be interpreted easily by data owners or researchers. The aspects are also unclear if the data is to be used in machine learning studies such as classification or clustering. This study proposes four techniques with fourteen criteria to systematically profile the datasets collected from different resources to distinguish from one another and see their strong and wea...
Citation Formats
M. A. AKYOL, M. O. Gökalp, K. Kayabay, P. E. Eren, and A. Koçyiğit, “A Context Aware Notification Architecture Based on Distributed Focused Crawling in the Big Data Era,” 2017, vol. 299, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/32408.