A Framework to Detect Disguised Missing Data

2011-01-01
Many manually populated very large databases suffer from data quality problems such as missing, inaccurate data and duplicate entries. A recently recognized data quality problem is that of disguised missing data which arises when an explicit code for missing data such as NA (Not Available) is not provided and a legitimate data value is used instead. Presence of these values may affect the outcome of data mining tasks severely such that association mining algorithms or clustering techniques may result in biased inaccurate association rules and invalid clusters respectively. Detection and elimination of these values are necessary but burdensome to be carried out manually. In this chapter, the methods to detect disguised missing values by visual inspection are explained first. Then, the authors describe the methods used to detect these values automatically. Finally, the framework to detect disguised missing data is proposed and a demonstration of the framework on spatial and categorical data sets is provided.
KNOWLEDGE DISCOVERY PRACTICES AND EMERGING APPLICATIONS OF DATA MINING: TRENDS AND NEW DOMAINS

Suggestions

An ilp-based concept discovery system for multi-relational data mining
Kavurucu, Yusuf; Karagöz, Pınar; Department of Computer Engineering (2009)
Multi Relational Data Mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. However, as patterns involve multiple relations, the search space of possible hypothesis becomes intractably complex. In order to cope with this problem, several relational knowledge discovery systems have been developed employing various search strategies, heuristics and language pattern limitations. In this thesis, Induct...
An Improved graph mining tool and its application to object detection in remote sensing
Aktaş, Ümit Ruşen; Yarman Vural, Fatoş Tunay; Department of Computer Engineering (2013)
In many graph-based data mining tools, the use of numeric values as attributes in graphs is very limited. Most algorithms require pre-processing of the attributes, which often involves discretization into bins and embedding group names in the input graph(s). In this thesis, we tackle this problem by utilizing all attributes as is, and directly incorporating them into the pattern mining process. In order to implement our method, we modify an existing graph-based knowledge discovery algorithm, SUBDUE, by addi...
A Content-Boosted Collaborative Filtering Approach for Movie Recommendation Based on Local and Global Similarity and Missing Data Prediction
Özbal, Gozde; Karaman, Hilal; Alpaslan, Ferda Nur (Oxford University Press (OUP), 2011-09-01)
Most traditional recommender systems lack accuracy in the case where data used in the recommendation process is sparse. This study addresses the sparsity problem and aims to get rid of it by means of a content-boosted collaborative filtering approach applied to a web-based movie recommendation system. The main motivation is to investigate whether further success can be obtained by combining 'local and global user similarity' and 'effective missing data prediction' approaches, which were previously introduce...
A Methodology to develop process ontology from organizational guidelines written in natural language
Gürbüz, Özge; Demirörs, Onur; Department of Information Systems (2017)
Integrating ontologies with process modeling improves data representations and makes it easier to query, store and reuse processes at the semantics level. Therefore, in recent years, this topic has become increasingly popular. The studies in the literature have proposed methods for the integration process either to relate domain ontologies to process models or to transform process models to process ontologies. Another way to establish the integration between ontologies and process models is to develop proce...
A Content Boosted Collaborative Filtering Approach for Movie Recommendation Based on Local Global Similarity and Missing Data Prediction
Özbal, Gözde; Kahraman, Hilal; Alpaslan, Ferda Nur (2010-09-22)
Many recommender systems lack in accuracy when the data used throughout the recommendation process is sparse. Our study addresses this limitation by means of a content boosted collaborative filtering approach applied to the task of movie recommendation. We combine two different approaches previously proved to be successful individually and improve over them by processing the content information of movies, as confirmed by our empirical evaluation results.
Citation Formats
R. Belen and T. Taşkaya Temizel, “A Framework to Detect Disguised Missing Data,” KNOWLEDGE DISCOVERY PRACTICES AND EMERGING APPLICATIONS OF DATA MINING: TRENDS AND NEW DOMAINS, pp. 1–22, 2011, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/70046.