New heuristics for performance improvement of ilp-based concept discovery systems

Download
2015
Doğan, Abdullah
A large amount of the valuable data in daily life is stored in relational databases. The accumulation of so much information motivates the need for extracting valuable patterns in relational databases. Background knowledge and a set of target examples that are stored in multiple tables are used to produce hypothesis for ILP-based concept discovery systems. Multiple arguments on these multiple tables end up large search spaces while building the hypothesis that arise computational efficiency problems. In this thesis we focus on concept discovery systems that use Apriori-based specialization operator and work directly on relational tables. Time efficiency of these ILP systems is directly proportional to the number of queries running on DBMS. These queries mostly involve support and confidence calculation queries of candidate concept rules generated on the search space. We aim to increase time efficiency by reducing the number of running queries on these systems. Particularly, we worked on Concept Rule Induction System (CRIS), which uses Aprioribased specialization in hypothesis construction. The methods we propose generate the same solutions as in CRIS. Therefore, we improve the efficiency without affecting the accuracy negatively. In the first method, we prune the concept descriptors using support coverage sets. These sets are stored for memoization support of CRIS. We use the existing sets in our proposed method so that they are also used for pruning the search space. In the second pruning method, we build cosine similarity matrix of attributes of each predicate in pre-processing step. During the specialization of concept descriptors, we prune the search space by utilizing this similarity matrix. Finally we examine the applicability of using NoSQL system MongoDB and a NewSQL system VoltDB as a storage for ILP system CRIS.

Suggestions

An intelligent fuzzy object-oriented database framework for video database applications
Ozgur, Nezihe Burcu; KOYUNCU, Murat; Yazıcı, Adnan (Elsevier BV, 2009-08-01)
Video database applications call for flexible and powerful modeling and querying facilities, which require an integration or interaction between database and knowledge-based technologies. It is also necessary for many real life video database applications to incorporate uncertainty, which naturally occurs due to the complex and subjective semantic content of video data. In this study, firstly, we introduce a fuzzy conceptual data model to represent the semantic content of video data. For that purpose, UML (...
New transitive closure algorithm for recursive query processing in deductive databases
Toroslu, İsmail Hakkı (1992-01-01)
© 1992 IEEE.The development of effic1.e11t algorithms to process the different forms of the transitive-closure (TC) queries within the context of large database systems has recently attracted a large amom1t of research efforts. In this paper, we present a neic algorithm suitable for full transitive closure problem, which zs used to solve uninstentiated recursive qi1enes in deductive databases. In this new algorithm there are two phases. In the first phase a general graph is condensed into an acyclic graph a...
Data mining for rule discovery in relatonal databases
Toprak, Serkan; Alpaslan, Ferda Nur; Department of Computer Engineering (2004)
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering mul...
An ilp-based concept discovery system for multi-relational data mining
Kavurucu, Yusuf; Karagöz, Pınar; Department of Computer Engineering (2009)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Induct...
A binomial noised model for cluster validation
Toledano-Kitai, Dvora; Avros, Renata; Volkovich, Zeev; Weber, Gerhard Wilhelm; Yahalom, Orly (IOS Press, 2013-01-01)
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal ("true") number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implemen...
Citation Formats
A. Doğan, “New heuristics for performance improvement of ilp-based concept discovery systems,” M.S. - Master of Science, Middle East Technical University, 2015.