Data mining for rule discovery in relatonal databases

Download
2004
Toprak, Serkan
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.

Suggestions

Design and implementation of a secure and searchable audit logging system
İncebacak, Davut; Çetin, Yasemin; Department of Information Systems (2007)
Logs are append-only time-stamped records to represent events in computers or network devices. Today, in many real-world networking applications, logging is a central service however it is a big challenge to satisfy the conflicting requirements when the security of log records is of concern. On one hand, being kept on mostly untrusted hosts, the logs should be preserved against unauthorized modifications and privacy breaches. On the other, serving as the primary evidence for digital crimes, logs are often n...
Security on mobile phones with lightweight cryptographic message syntax
Kubilay, Murat Yasin; Özgit, Attila; Department of Computer Engineering (2007)
Cryptographic Message Syntax (CMS) is a standard for protecting messages cryptographically. Using CMS, messages can be protected in different content types such as signed-data, enveloped-data, digested-data and authenticated-data. CMS is architected around certificate based key management and symmetric cryptography. In this thesis, a lightweight CMS envelope is proposed for the mobile phones which have limited memory and processing power, in order to provide the privacy of the data either stored on them or ...
Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
Semantic service discovery with heuristic relevance calculation
Özyönüm, Müge; Doğru, Ali Hikmet; Department of Computer Engineering (2010)
In this thesis, a semantically aided web service and restful service search mechanism is presented that makes use of an ontology. The mechanism relates method names, input and output parameters for ontology guided matches and offers results with varying relevance corresponding to the matching degree. The mechanism is demonstrated using an experimental domain that is tourism and travel. An ontology is created to support a set of web services that exist in this domain.
Data mining in deductive databases using query flocks
Toroslu, İsmail Hakkı (Elsevier BV, 2005-04-01)
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a 'generate-and-test' model for different kind of patterns, can also be applied to deduct...
Citation Formats
S. Toprak, “Data mining for rule discovery in relatonal databases,” M.S. - Master of Science, Middle East Technical University, 2004.