Enhancing and abstracting scientific workflow provenance for data publishing

2013-03-22
Pınar, Alper
Belhajjame, Khalid
Goble, Carole A
Karagöz, Pınar
Many scientists are using workflows to systematically design and run computational experiments. Once the workflow is executed, the scientist may want to publish the dataset generated as a result, to be, e.g., reused by other scientists as input to their experiments. In doing so, the scientist needs to curate such dataset by specifying metadata information that describes it, e.g. its derivation history, origins and ownership. To assist the scientist in this task, we explore in this paper the use of provenance traces collected by workflow management systems when enacting workflows. Specifically, we identify the shortcomings of such raw provenance traces in supporting the data publishing task, and propose an approach whereby distilled, yet more informative, provenance traces that are fit for the data publishing task can be derived.

Suggestions

Modelling and predicting binding affinity of PCP-like compounds using machine learning methods
Erdaş, Özlem; Alpaslan, Ferda Nur; Department of Computer Engineering (2007)
Machine learning methods have been promising tools in science and engineering fields. The use of these methods in chemistry and drug design has advanced after 1990s. In this study, molecular electrostatic potential (MEP) surfaces of PCP-like compounds are modelled and visualized in order to extract features which will be used in predicting binding affinity. In modelling, Cartesian coordinates of MEP surface points are mapped onto a spherical self-organizing map. Resulting maps are visualized by using values...
Representing temporal knowledge in connectionist expert systems
Alpaslan, Ferda Nur (1996-09-27)
This paper introduces a new temporal neural networks model which can be used in connectionist expert systems. Also, a Variation of backpropagation algorithm, called the temporal feedforward backpropagation algorithm is introduced as a method for training the neural network. The algorithm was tested using training examples extracted from a medical expert system. A series of experiments were carried out using the temporal model and the temporal backpropagation algorithm. The experiments indicated that the alg...
Acceleration of molecular dynamics simulation for TERSOFF2 potential through reconfigurable hardware
Vargün, Bilgin; Erkoç, Şakir; Eminoğlu, Selim; Department of Micro and Nanotechnology (2012)
In nanotechnology, Carbon Nanotubes systems are studied with Molecular Dynamics Simulation software programs investigating the properties of molecular structure. Computational loads are very complex in these kinds of software programs. Especially in three body simulations, it takes a couple of weeks for small number of atoms. Researchers use supercomputers to study more complex systems. In recent years, by the development of sophisticated Field Programmable Gate Array (FPGA) Technology, researchers design s...
Automated biological data acquisition and integration using machine learning techniques
Çarkacıoğlu, Levent; Atalay, Mehmet Volkan; Department of Computer Engineering (2009)
Since the initial genome sequencing projects along with the recent advances on technology, molecular biology and large scale transcriptome analysis result in data accumulation at a large scale. These data have been provided in different platforms and come from different laboratories therefore, there is a need for compilation and comprehensive analysis. In this thesis, we addressed the automatization of biological data acquisition and integration from these non-uniform data using machine learning techniques....
Enabling Grids for E-sciencE III (EGEE-III)
Şener, Cevat(2010-4-30)
A globally distributed computing Grid now plays an essential role for large-scale, data intensive science in many fields of research. The concept has been proven viable through the Enabling Grids for E-sciencE project (EGEE and EGEE-II, 2004-2008) and its related projects. EGEE-II is consolidating the operations and middleware of this Grid for use by a wide range of scientific communities, such as astrophysics, computational chemistry, earth and life sciences, fusion and particle physics. Strong quality ass...
Citation Formats
A. Pınar, K. Belhajjame, C. A. Goble, and P. Karagöz, “Enhancing and abstracting scientific workflow provenance for data publishing,” 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35728.