A classification algorithm using mahalanobis distance clustering of data with applications on biomedical data sets

Download

index.pdf

Date

2011

Author

Durak, Bahadır

Metadata

Show full item record

Item Usage Stats

263
views

198
downloads

The concept of classification is used and examined by the scientific community for hundreds of years. In this historical process, different methods and algorithms have been developed and used. Today, although the classification algorithms in literature use different methods, they are acting on a similar basis. This basis is setting the desired data into classes by using defined properties, with a different discourse; an effort to establish a relationship between known features with unknown result. This study was intended to bring a different perspective to this common basis. In this study, not only the basic features of data are used, the class of the data is also included as a parameter. The aim of this method is also using the information in the algorithm that come from a known value. In other words, the class, in which the data is included, is evaluated as an input and the data set is transferred to a higher dimensional space which is a new working environment. In this new environment it is not a classification problem anymore, but a clustering problem. Although this logic is similar with Kernel Methods, the methodologies are different from the way that how they transform the working space. In the projected new space, the clusters based on calculations performed with the Mahalanobis Distance are evaluated in original space with two different heuristics which are center-based and KNN-based algorithm. In both heuristics, increase in classification success rates achieved by this methodology. For center based algorithm, which is more sensitive to new input parameter, up to 8% of enhancement is observed.

Subject Keywords

Industrial engineering., Data mining.

URI

http://etd.lib.metu.edu.tr/upload/12612852/index.pdf
https://hdl.handle.net/11511/20348

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Fuzzy classification models based on tanaka’s fuzzy linear regression approach and nonparametric improved fuzzy classifier functions Özer, Gizem; Köksal, Gülser; Department of Industrial Engineering (2009) In some classification problems where human judgments, qualitative and imprecise data exist, uncertainty comes from fuzziness rather than randomness. Limited number of fuzzy classification approaches is available for use for these classification problems to capture the effect of fuzzy uncertainty imbedded in data. The scope of this study mainly comprises two parts: new fuzzy classification approaches based on Tanaka’s Fuzzy Linear Regression (FLR) approach, and an improvement of an existing one, Improved Fu...
Optimization of gene-environment networks in the presence of errors and uncertainty with Chebychev approximation Weber, Gerhard Wilhelm; Taylan, P.; Alparslan-Gok, S. Z.; Oezoeguer-Akyuz, S.; Akteke-Ozturk, B. (Springer Science and Business Media LLC, 2008-12-01) This mathematical contribution is addressed towards the wide interface of life and human sciences that exists between biological and environmental information. Like very few other disciplines only, the modeling and prediction of genetical data is requesting mathematics nowadays to deeply understand its foundations. This need is even forced by the rapid changes in a world of globalization. Such a study has to include aspects of stability and tractability; the still existing limitations of modern technology i...
A new outlier detection method based on convex optimization: application to diagnosis of Parkinson's disease TAYLAN, PAKİZE; Yerlikaya-Ozkurt, Fatma; Bilgic Ucak, Burcu; Weber, Gerhard Wilhelm (Informa UK Limited, 2020-12-01) Neuroscience is a combination of different scientific disciplines which investigate the nervous system for understanding of the biological basis. Recently, applications to the diagnosis of neurodegenerative diseases like Parkinson's disease have become very promising by considering different statistical regression models. However, well-known statistical regression models may give misleading results for the diagnosis of the neurodegenerative diseases when experimental data contain outlier observations that l...
A comparison of orthogonal cutting data from experiments with three different finite element models Bil, H; Kilic, SE; Tekkaya, AE (Elsevier BV, 2004-07-01) The aim of this study is to compare various simulation models of orthogonal cutting process with each other as well as with the results of various experiments. Commercial implicit finite element codes MSC.Marc, Deform2D and the explicit code Thirdwave AdvantEdge have been used. In simulations, a rigid tool is advanced incrementally into the deformable workpiece which is remeshed whenever needed. In simulations with MSC.Marc and Thirdwave AdvantEdge, there is no separation criterion defined since chip format...
A novel approach to chemical resemblance of alternant hydrocarbons Türker, Burhan Lemi (2002-02-14) Within the constraints of the Huckel molecular orbital theory, a topological approach has been developed for the resemblance of alternant hydrocarbons. Four topological variables are considered which categorize alternant hydrocarbons having some resemblance of certain degree between them. Depending on variations of these topological variables, various groups of the compounds, including the identity case, various isomers and nonresembling systems have been investigated.

Citation Formats

B. Durak, “A classification algorithm using mahalanobis distance clustering of data with applications on biomedical data sets,” M.S. - Master of Science, Middle East Technical University, 2011.