Data mining for rule discovery in relatonal databases

Download
2004
Toprak, Serkan
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering multi-relational association rules in relational databases. The implementation is based on a framework providing a representation of patterns in relational databases, refinement methods of patterns, and primitives for obtaining necessary record counts from database to calculate measures for patterns. The framework exploits meta-data of relational databases for pruning search space of patterns. The implementation extends the framework by employing Apriori algorithm for further pruning the search space and discovering relational recursive patterns. Apriori algorithm is used for finding large itemsets of tables, which are used to refine patterns. Apriori algorithm is modified by changing support calculation method for itemsets. A method for determining recursive relations is described and a solution is provided for handling recursive patterns using aliases. Additionally, continuous attributes of tables are discretized utilizing equal-depth partitioning. The implementation is tested with gene localization prediction task of KDD Cup 2001 and results are compared to those of the winner approach.

Suggestions

Security on mobile phones with lightweight cryptographic message syntax
Kubilay, Murat Yasin; Özgit, Attila; Department of Computer Engineering (2007)
Cryptographic Message Syntax (CMS) is a standard for protecting messages cryptographically. Using CMS, messages can be protected in different content types such as signed-data, enveloped-data, digested-data and authenticated-data. CMS is architected around certificate based key management and symmetric cryptography. In this thesis, a lightweight CMS envelope is proposed for the mobile phones which have limited memory and processing power, in order to provide the privacy of the data either stored on them or ...
Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
Implementation of concurrent constraint transaction logic and its user interface
Altunyuva, Fethi; Karagöz, Pınar; Department of Computer Engineering (2006)
This thesis implements a logical formalism framework called Concurrent Constraint Transaction Logic (abbr.,CCTR) which was defined for modeling and scheduling of workflows under resource allocation and cost constraints and develops an extensible and flexible graphical user interface for the framework. CCTR extends Concurrent Transaction Logic and integrates with Constraint Logic Programming to find the correct scheduling of tasks that involves resource and cost constraints. The developed system, which integ...
Bayesian learning under nonnormality
Yılmaz, Yıldız Elif; Alpaslan, Ferda Nur; Department of Computer Engineering (2004)
Naive Bayes classifier and maximum likelihood hypotheses in Bayesian learning are considered when the errors have non-normal distribution. For location and scale parameters, efficient and robust estimators that are obtained by using the modified maximum likelihood estimation (MML) technique are used. In naive Bayes classifier, the error distributions from class to class and from feature to feature are assumed to be non-identical and Generalized Secant Hyperbolic (GSH) and Generalized Logistic (GL) distribut...
Data mining in deductive databases using query flocks
Toroslu, İsmail Hakkı (Elsevier BV, 2005-04-01)
Data mining can be defined as a process for finding trends and patterns in large data. An important technique for extracting useful information, such as regularities, from usually historical data, is called as association rule mining. Most research on data mining is concentrated on traditional relational data model. On the other hand, the query flocks technique, which extends the concept of association rule mining with a 'generate-and-test' model for different kind of patterns, can also be applied to deduct...
Citation Formats
S. Toprak, “Data mining for rule discovery in relatonal databases,” M.S. - Master of Science, Middle East Technical University, 2004.