Small Is Beautiful Summarizing Scientific Workflows Using Semantic Annotations

2013-07-02
Alper, Pinar
Belhajjame, Khalid
Goble, Carole
Karagöz, Pınar
Scientific Workflows have become the workhorse of BigData analytics for scientists. As well as being repeatable and optimizable pipelines that bring together datasets and analysis tools, workflows make-up an important part of the provenance of data generated from their execution. By faithfully capturing all stages in the analysis, workflows play a critical part in building up the audit-trail (a.k.a. provenance) meta-data for derived datasets and contributes to the veracity of results. Provenance is essential for reporting results, reporting the method followed, and adapting to changes in the datasets or tools. These functions, however, are hampered by the complexity of workflows and consequently the complexity of data-trails generated from their instrumented execution. In this paper we propose the generation of workflow description summaries in order to tackle workflow complexity. We elaborate reduction primitives for summarizing workflows, and show how primitives, as building blocks, can be used in conjunction with semantic workflow annotations to encode different summarization strategies. We report on the effectiveness of the method through experimental evaluation using real-world workflows from the Taverna system.

Suggestions

LabelFlow: Exploiting Workflow Provenance to Surface Scientific Data Provenance
Alper, Pinar; Belhajjame, Khalid; Goble, Carole A.; Karagöz, Pınar (2014-06-11)
Provenance traces captured by scientific workflows can be useful for designing, debugging and maintenance. However, our experience suggests that they are of limited use for reporting results, in part because traces do not comprise domain-specific annotations needed for explaining results, and the black-box nature of some workflow activities. We show that by basic mark-up of the data processing within activities and using a set of domain specific label generation functions, standard workflow provenance can b...
Activity Learning from Lifelogging Images
Belli, Kader; Akbaş, Emre; Yazıcı, Adnan (2019-01-01)
The analytics of lifelogging has generated great interest for data scientists because big and multi-dimensional data are generated as a result of lifelogging activities. In this paper, the NTCIR Lifelog dataset is used to learn activities from an image point of view. Minute definitions are classified into activity classes using images and annotations, which serve as a basis for various classification techniques, namely SVMs and convolutional neural network structures (CNN), for learning activities. The perf...
Large-Scale Renewable Energy Monitoring and Forecast Based on Intelligent Data Analysis
Özkan, Mehmet Barış; Küçük, Dilek; Buhan, Serkan; Demirci, Turan; Karagöz, Pınar (IGI Global, 2020-01-01)
Intelligent data analysis techniques such as data mining or statistical/machine learning algorithms are applied to diverse domains, including energy informatics. These techniques have been successfully employed in order to solve different problems within the energy domain, particularly forecasting problems such as renewable energy and energy consumption forecasts. This chapter elaborates the use of intelligent data analysis techniques for the facilitation of renewable energy monitoring and forecast. First, ...
Small Cells in the Forthcoming 5G/IoT: Traffic Modelling and Deployment Overview
Al-Turjman, Fadi; Ever, Enver; Zahmatkesh, Hadi (2019-01-01)
This paper provides an overview of the use of small cells (e.g., femtocells) in the Internet of Things (IoT) environments. As a result of rapid increase in the number of mobile connected devices such as smart-phones and tablets, the demand for data traffic is exponentially increasing. In order to satisfy mobile users' requests and meet the requirements of high data traffic, mobile operators have to increase the network capacities dramatically. One of the promising solutions for the network operators to impr...
A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization
TAYLAN, PAKİZE; Weber, Gerhard Wilhelm; Ozkurt, Fatma Yerlikaya (2010-12-01)
This paper introduces a model-based approach to the important data mining tool Multivariate adaptive regression splines (MARS), which has originally been organized in a more model-free way. Indeed, MARS denotes a modern methodology from statistical learning which is important in both classification and regression, with an increasing number of applications in many areas of science, economy and technology. It is very useful for high-dimensional problems and shows a great promise for fitting nonlinear multivar...
Citation Formats
P. Alper, K. Belhajjame, C. Goble, and P. Karagöz, “Small Is Beautiful Summarizing Scientific Workflows Using Semantic Annotations,” 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35235.