Detecting disguised missing data

Download
2009
Belen, Rahime
In some applications, explicit codes are provided for missing data such as NA (not available) however many applications do not provide such explicit codes and valid or invalid data codes are recorded as legitimate data values. Such missing values are known as disguised missing data. Disguised missing data may affect the quality of data analysis negatively, for example the results of discovered association rules in KDD-Cup-98 data sets have clearly shown the need of applying data quality management prior to analysis. In this thesis, to tackle the problem of disguised missing data, we analyzed embedded unbiased sample heuristic (EUSH), demonstrated the methods drawbacks and proposed a new methodology based on Chi Square Two Sample Test. The proposed method does not require any domain background knowledge and compares favorably with EUSH.

Suggestions

Classification of remotely sensed data by using 2D local discriminant bases
Tekinay, Çağrı; Çetin, Yasemin; Department of Information Systems (2009)
In this thesis, 2D Local Discriminant Bases (LDB) algorithm is used to 2D search structure to classify remotely sensed data. 2D Linear Discriminant Analysis (LDA) method is converted into an M-ary classifier by combining majority voting principle and linear distance parameters. The feature extraction algorithm extracts the relevant features by removing the irrelevant ones and/or combining the ones which do not represent supplemental information on their own. The algorithm is implemented on a remotely sensed...
Constructing linear unequal error protection codes from algebraic curves
Özbudak, Ferruh (Institute of Electrical and Electronics Engineers (IEEE), 2003-06-01)
We show that the concept of "generalized algebraic geometry codes" which was recently introduced by Xing, Niederreiter, and Lam gives a natural framework for constructing linear unequal error protection codes.
Using Pad-Stripped Acausally Filtered Strong-Motion Data
Boore, David M.; Sisi, Aida Azari; Akkar, Dede Sinan (2012-04-01)
Most strong-motion data processing involves acausal low-cut filtering, which requires the addition of sometimes lengthy zero pads to the data. These padded sections are commonly removed by organizations supplying data, but this can lead to incompatibilities in measures of ground motion derived in the usual way from the padded and the pad-stripped data. One way around this is to use the correct initial conditions in the pad-stripped time series when computing displacements, velocities, and linear oscillator ...
Detection of shortened OOC codewords in optical CDMA systems with double hard-limiters
Argon, C; Ergul, R (2000-04-15)
The effect of double hard-limiters in direct-detection optical asynchronous code division multiple access (CDMA) systems using shortened optical orthogonal codes (OOC) is investigated in this work. The performance improvement in case of shortened OOCs is shown by simulating the detector operation for the single and double hard-limiter cases and the two cases are compared to each other. In addition to previous stated results, it is shown that the double hard-limiters improve the receiver performance not only...
A Prediction method on the post-failure properties of rock and its application to tunnels
Öge, İbrahim Ferid; Karpuz, Celal; Department of Mining Engineering (2013)
Due to special testing system requirements, data related to the post-peak region of the intact rock laboratory parameters are not as commonly available as pre-peak and peak- state parameters of stress-strain behavior. For geotechnical problems involving rock mass in failed state around the rock structures, proper choice of plastic constitutive laws and post-failure input parameters is important for a realistic modeling and simulation of the failed state of the rock mass. A total of seventy-three post-failur...
Citation Formats
R. Belen, “Detecting disguised missing data,” M.S. - Master of Science, Middle East Technical University, 2009.