Hide/Show Apps

Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement

Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, language pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we introduce an ILP-based concept discovery framework named Concept Rule Induction System (CRIS) which includes new approaches for search space pruning and new features, such as defining aggregate predicates and handling numeric attributes, for rule quality improvement. In CRIS, all target instances are considered together, which leads to construction of more descriptive rules for the concept. This property also makes it possible to use aggregate predicates more accurately in concept rule construction. Moreover, it facilitates construction of transitive rules. A set of experiments is conducted in order to evaluate the performance of proposed method in terms of accuracy and coverage. (C) 2010 Elsevier B.V. All rights reserved.