Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Detecting disguised missing data
Download
index.pdf
Date
2009
Author
Belen, Rahime
Metadata
Show full item record
Item Usage Stats
109
views
85
downloads
Cite This
In some applications, explicit codes are provided for missing data such as NA (not available) however many applications do not provide such explicit codes and valid or invalid data codes are recorded as legitimate data values. Such missing values are known as disguised missing data. Disguised missing data may affect the quality of data analysis negatively, for example the results of discovered association rules in KDD-Cup-98 data sets have clearly shown the need of applying data quality management prior to analysis. In this thesis, to tackle the problem of disguised missing data, we analyzed embedded unbiased sample heuristic (EUSH), demonstrated the methods drawbacks and proposed a new methodology based on Chi Square Two Sample Test. The proposed method does not require any domain background knowledge and compares favorably with EUSH.
Subject Keywords
Information systems
,
Data analysis.
URI
http://etd.lib.metu.edu.tr/upload/12610411/index.pdf
https://hdl.handle.net/11511/18492
Collections
Graduate School of Informatics, Thesis
Suggestions
OpenMETU
Core
Classification of remotely sensed data by using 2D local discriminant bases
Tekinay, Çağrı; Çetin, Yasemin; Department of Information Systems (2009)
In this thesis, 2D Local Discriminant Bases (LDB) algorithm is used to 2D search structure to classify remotely sensed data. 2D Linear Discriminant Analysis (LDA) method is converted into an M-ary classifier by combining majority voting principle and linear distance parameters. The feature extraction algorithm extracts the relevant features by removing the irrelevant ones and/or combining the ones which do not represent supplemental information on their own. The algorithm is implemented on a remotely sensed...
Constructing linear unequal error protection codes from algebraic curves
Özbudak, Ferruh (Institute of Electrical and Electronics Engineers (IEEE), 2003-06-01)
We show that the concept of "generalized algebraic geometry codes" which was recently introduced by Xing, Niederreiter, and Lam gives a natural framework for constructing linear unequal error protection codes.
Spatially Coupled Codes Optimized for Magnetic Recording Applications
Esfahanizadeh, Homa; Hareedy, Ahmed; Dolecek, Lara (2017-02-01)
© 1965-2012 IEEE.Spatially coupled (SC) codes are a class of sparse graph-based codes known to have capacity-approaching performance. SC codes are constructed based on an underlying low-density parity-check (LDPC) code, by first partitioning the underlying block code and then putting replicas of the components together. Significant recent research efforts have been devoted to the asymptotic, ensemble-averaged study of SC codes, as these coupled variants of the existing LDPC codes offer excellent properties....
Developing a parcel-based information system by object-oriented approach
Tufan, Emrah; Akyürek, Sevda Zuhal; Küçükpehlivan, Tuncay; Department of Geodetic and Geographical Information Technologies (2003)
The cadastre contains parcel related data which must be up-to-date. The cadastral data in any country constitute a very big dataset. Therefore parcel related data should be carefully managed. Today, using a database is an effective way of data management. The relational database management system can be a good one for parcel related data. However when the information system concept is considered, just relational database management system is not enough. Some tools are needed in order to manipulate the data ...
A genetic-based intelligent intrusion detection system
Özbey, Halil; Şen, Tayyar; Department of Industrial Engineering (2005)
In this study we address the problem of detecting new types of intrusions to computer systems which cannot be handled by widely implemented knowledge-based mechanisms. The solutions offered by behavior-based prototypes either suffer low accuracy and low completeness or require use data eplaining abnormal behavior which actually is not available. Our aim is to develop an algorithm which can produce a satisfactory model of the target system̕s behavior in the absence of negative data. First, we design and deve...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
R. Belen, “Detecting disguised missing data,” M.S. - Master of Science, Middle East Technical University, 2009.