Employing Machine Learning Techniques for Data Enrichment: Increasing the number of samples for effective gene expression data analysis

2011-11-15
Erdogdu, Utku
TAN, MEHMET
Alhajj, Reda
Polat, Faruk
Demetrick, Douglas
Rokne, Jon
For certain domains, e. g. bioinformatics, producing more real samples is costly, error prone and time consuming. Therefore, there is a need for an intelligent automated process capable of substituting the real samples by artificial samples that carry the same characteristics as the real samples and hence could be used for running comprehensive testing of new methodologies. Motivated by this need, we describe a novel approach that integrates Probabilistic Boolean Network and genetic algorithm based techniques into a framework that uses some existing real samples as input and successfully produces new samples as output. The new samples will inspire the characteristics of the existing samples without duplicating them. This leads to diversity in the samples and hence a more rich set of samples to be used in testing. The developed framework incorporates two models (perspectives) for sample generation. We illustrate its applicability for producing new gene expression data samples; a high demanding area that has not received attention. The two perspectives employed in the process are based on models that are not closely related; the independence eliminates the bias of having the produced approach covering only certain characteristics of the domain and leading to samples skewed towards one direction. The produced results are very promising in showing the effectiveness, usefulness and applicability of the proposed multi-model framework.

Suggestions

A domain framework approach offering default relations
Kargı, Ersin Eray; Doğru, Ali Hikmet; Department of Computer Engineering (2005)
In order to use components that are developed for a domain, domain knowledge is required. If the default relations in a domain are offered by a framework, this can be a starting point for the application engineer as an important kind of domain knowledge. A generic design for creating and saving a domain is implemented in this thesis. This approach starts with creating a domain from components and relations among these components. The relations and components are saved once and used several times. In additio...
Semi-automatic construction of a domain ontology for wind energy using Wikipedia articles
Kucuk, Dilek; Arslan, Yusuf (Elsevier BV, 2014-02-01)
Domain ontologies are important information sources for knowledge-based systems. Yet, building domain ontologies from scratch is known to be a very labor-intensive process. In this study, we present our semi-automatic approach to building an ontology for the domain of wind energy which is an important type of renewable energy with a growing share in electricity generation all over the world. Related Wikipedia articles are first processed in an automated manner to determine the basic concepts of the domain t...
Analysis of push-type epidemic data dissemination in fully connected networks
ÇAĞLAR, MİNE; Sezer, Ali Devin (2014-07-01)
Consider a fully connected network of nodes, some of which have a piece of data to be disseminated to the whole network. We analyze the following push-type epidemic algorithm: in each push round, every node that has the data, i.e., every infected node, randomly chooses c E Z. other nodes in the network and transmits, i.e., pushes, the data to them. We write this round as a random walk whose each step corresponds to a random selection of one of the infected nodes; this gives recursive formulas for the distri...
Improved viewshed analysis algorithms for avionics applications
Özkıdık, Mustafa; Koçyiğit, Altan; Department of Information Systems (2019)
Viewshed analysis is a common GIS capability used in various domains with various requirements. In avionics, viewshed analysis is a part of accuracy critical applications and the real time operating systems in embedded devices use preemptive scheduling algorithms to satisfy performance requirements. Therefore, to effectively benefit from the viewshed analysis, a method should be both fast and accurate. Although R3 algorithm is accepted as an accuracy benchmark, R2 algorithm with lower accuracy is preferred ...
Attack Independent Perceptual Improvement of Adversarial Examples
Karlı, Berat Tuna; Temizel, Alptekin; Department of Information Systems (2022-12-23)
Deep Neural networks (DNNs) are used in a variety of domains with great success, however, it has been proven that these networks are vulnerable to additive non-arbitrary perturbations. Regarding this fact, several attack and defense mechanisms have been developed; nevertheless, adding crafted perturbations has a negative effect on the perceptual quality of images. This study aims to improve the perceptual quality of adversarial examples independent of attack type and the integration of two attack agnostic t...
Citation Formats
U. Erdogdu, M. TAN, R. Alhajj, F. Polat, D. Demetrick, and J. Rokne, “Employing Machine Learning Techniques for Data Enrichment: Increasing the number of samples for effective gene expression data analysis,” 2011, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/42233.