Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
New heuristics for performance improvement of ilp-based concept discovery systems
Download
index.pdf
Date
2015
Author
Doğan, Abdullah
Metadata
Show full item record
Item Usage Stats
107
views
76
downloads
Cite This
A large amount of the valuable data in daily life is stored in relational databases. The accumulation of so much information motivates the need for extracting valuable patterns in relational databases. Background knowledge and a set of target examples that are stored in multiple tables are used to produce hypothesis for ILP-based concept discovery systems. Multiple arguments on these multiple tables end up large search spaces while building the hypothesis that arise computational efficiency problems. In this thesis we focus on concept discovery systems that use Apriori-based specialization operator and work directly on relational tables. Time efficiency of these ILP systems is directly proportional to the number of queries running on DBMS. These queries mostly involve support and confidence calculation queries of candidate concept rules generated on the search space. We aim to increase time efficiency by reducing the number of running queries on these systems. Particularly, we worked on Concept Rule Induction System (CRIS), which uses Aprioribased specialization in hypothesis construction. The methods we propose generate the same solutions as in CRIS. Therefore, we improve the efficiency without affecting the accuracy negatively. In the first method, we prune the concept descriptors using support coverage sets. These sets are stored for memoization support of CRIS. We use the existing sets in our proposed method so that they are also used for pruning the search space. In the second pruning method, we build cosine similarity matrix of attributes of each predicate in pre-processing step. During the specialization of concept descriptors, we prune the search space by utilizing this similarity matrix. Finally we examine the applicability of using NoSQL system MongoDB and a NewSQL system VoltDB as a storage for ILP system CRIS.
Subject Keywords
Concepts.
,
Induction (Logic).
,
Artificial intelligence.
,
Logic programming.
,
Computer programming.
URI
http://etd.lib.metu.edu.tr/upload/12619033/index.pdf
https://hdl.handle.net/11511/25027
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
An intelligent fuzzy object-oriented database framework for video database applications
Ozgur, Nezihe Burcu; KOYUNCU, Murat; Yazıcı, Adnan (Elsevier BV, 2009-08-01)
Video database applications call for flexible and powerful modeling and querying facilities, which require an integration or interaction between database and knowledge-based technologies. It is also necessary for many real life video database applications to incorporate uncertainty, which naturally occurs due to the complex and subjective semantic content of video data. In this study, firstly, we introduce a fuzzy conceptual data model to represent the semantic content of video data. For that purpose, UML (...
New transitive closure algorithm for recursive query processing in deductive databases
Toroslu, İsmail Hakkı (1992-01-01)
© 1992 IEEE.The development of effic1.e11t algorithms to process the different forms of the transitive-closure (TC) queries within the context of large database systems has recently attracted a large amom1t of research efforts. In this paper, we present a neic algorithm suitable for full transitive closure problem, which zs used to solve uninstentiated recursive qi1enes in deductive databases. In this new algorithm there are two phases. In the first phase a general graph is condensed into an acyclic graph a...
Data mining for rule discovery in relatonal databases
Toprak, Serkan; Alpaslan, Ferda Nur; Department of Computer Engineering (2004)
Data is mostly stored in relational databases today. However, most data mining algorithms are not capable of working on data stored in relational databases directly. Instead they require a preprocessing step for transforming relational data into algorithm specified form. Moreover, several data mining algorithms provide solutions for single relations only. Therefore, valuable hidden knowledge involving multiple relations remains undiscovered. In this thesis, an implementation is developed for discovering mul...
An ilp-based concept discovery system for multi-relational data mining
Kavurucu, Yusuf; Karagöz, Pınar; Department of Computer Engineering (2009)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Induct...
A binomial noised model for cluster validation
Toledano-Kitai, Dvora; Avros, Renata; Volkovich, Zeev; Weber, Gerhard Wilhelm; Yahalom, Orly (IOS Press, 2013-01-01)
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal ("true") number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implemen...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. Doğan, “New heuristics for performance improvement of ilp-based concept discovery systems,” M.S. - Master of Science, Middle East Technical University, 2015.