Multi-class classification methods utilizing mahalanobis taguchi system and a re-sampling approach for imbalanced data sets

Download
2009
Ayhan, Dilber
Classification approaches are used in many areas in order to identify or estimate classes, which different observations belong to. The classification approach, Mahalanobis Taguchi System (MTS) is analyzed and further improved for multi-class classification problems under the scope of this thesis study. MTS tries to explore significant variables and classify a new observation based on its Mahalanobis distance (MD). In this study, first, sample size problems, which are encountered mostly in small data sets, and multicollinearity problems, which constitute some limitations of MTS, are analyzed and a re-sampling approach is explored as a solution. Our re-sampling approach, which only works for data sets with two classes, is a combination of over-sampling and under-sampling. Over-sampling is based on SMOTE, which generates the synthetic observations between the nearest neighbors of observations in the minority class. In addition, MTS models are used to test the performance of several re-sampling parameters, for which the most appropriate values are sought specific to each case. In the second part, multi-class classification methods with MTS are developed. An algorithm, namely Feature Weighted Multi-class MTS-I (FWMMTS-I), is inspired by the descent feature weighted MD. It relaxes adding up of the MDs for variables equally. This provides representations of noisy variables with weights close to zero so that they do not mask the other variables. As a second multi-class classification algorithm, the original MTS method is extended to multi-class problems, which is called Multi-class MTS (MMTS). In addition, a comparable approach to that of Su and Hsiao (2009), which also considers weights of variables, is studied with a modification in MD calculation. It is named as Feature Weighted Multi-class MTS-II (FWMMTS-II). The methods are compared on eight different multi-class data sets using a 5-fold stratified cross validation approach. Results show that FWMMTS-I is as accurate as MMTS, and they are better than FWMMTS-II. Interestingly, the Mahalanobis Distance Classifier (MDC) using all the variables directly in the classification model has performed equally well on the studied data sets.

Suggestions

Fuzzy classification models based on tanaka’s fuzzy linear regression approach and nonparametric improved fuzzy classifier functions
Özer, Gizem; Köksal, Gülser; Department of Industrial Engineering (2009)
In some classification problems where human judgments, qualitative and imprecise data exist, uncertainty comes from fuzziness rather than randomness. Limited number of fuzzy classification approaches is available for use for these classification problems to capture the effect of fuzzy uncertainty imbedded in data. The scope of this study mainly comprises two parts: new fuzzy classification approaches based on Tanaka’s Fuzzy Linear Regression (FLR) approach, and an improvement of an existing one, Improved Fu...
A comparison of data mining methods for prediction and classification types of quality problems
Anaklı, Zeynep; Anaklı, Zeynep; Department of Industrial Engineering (2009)
In this study, an Analytic Network Process (ANP) and Preference Ranking Organization MeTHod for Enrichment Evaluations (PROMETHEE) based approach is developed and used to compare overall performance of some commonly used classification and prediction data mining methods on quality improvement data, according to several decision criteria. Classification and prediction data mining (DM) methods are frequently used in many areas including quality improvement. Previous studies on comparison of performance of the...
Hybrid ranking approaches based on data envelopment analysis and outranking relations
Eryılmaz, Utkan; Karasakal, Esra; Department of Industrial Engineering (2006)
In this study two different hybrid ranking approaches based on data envelopment analysis and outranking relations for ranking alternatives are proposed. Outranking relations are widely used in Multicriteria Decision Making (MCDM) for ranking the alternatives and appropriate in situations when we have limited information on the preference structure of the decision maker. Yet to apply these methods DM should provide exact values for method parameters (weights, thresholds etc.) as well as basic information suc...
An improved organization method for association rules and a basis for comparison of methods
Jabarnejad, Masood; Köksal, Gülser; Department of Industrial Engineering (2010)
In large data, set of mined association rules are typically large in number and hard to interpret. Some grouping and pruning methods have been developed to make rules more understandable. In this study, one of these methods is modified to be more effective and more efficient in applications including low thresholds for support or confidence, such as association analysis of product/process quality improvement. Results of experiments on benchmark datasets show that the proposed method groups and prunes more r...
The general lot sizing and scheduling problem with sequence dependent changeovers
Koçlar, Ayşe; Süral, Haldun; Department of Industrial Engineering (2005)
In this study, we consider the General Lot Sizing and Scheduling Problem in single level capacitated environments with sequence dependent item changeovers. Process industries may be regarded as suitable application areas of the problem. The focus on capacity utilization and intensively time consuming changeovers necessitate the integration of lot sizing and sequencing decisions in the production plan. We present a mathematical model which captures the essence of cases in the most generic and realistic setti...
Citation Formats
D. Ayhan, “Multi-class classification methods utilizing mahalanobis taguchi system and a re-sampling approach for imbalanced data sets,” M.S. - Master of Science, Middle East Technical University, 2009.