Resampling approach for cluster model selection

Download

index.pdf

Date

2011-10-01

Author

Volkovich, Z.
Barzily, Z.
Weber, Gerhard Wilhelm
Toledano-Kitai, D.
Avros, R.

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

286
views

0
downloads

In cluster analysis, selecting the number of clusters is an "ill-posed" problem of crucial importance. In this paper we propose a re-sampling method for assessing cluster stability. Our model suggests that samples' occurrences in clusters can be considered as realizations of the same random variable in the case of the "true" number of clusters. Thus, similarity between different cluster solutions is measured by means of compound and simple probability metrics. Compound criteria result in validation rules employing the stability content of clusters. Simple probability metrics, in particular those based on kernels, provide more flexible geometrical criteria. We analyze several applications of probability metrics combined with methods intended to simulate cluster occurrences. Numerical experiments are provided to demonstrate and compare the different metrics and simulation approaches.

Subject Keywords

Software, Artificial Intelligence

URI

https://hdl.handle.net/11511/57879

Journal

MACHINE LEARNING

DOI

https://doi.org/10.1007/s10994-011-5236-9

Collections

Graduate School of Applied Mathematics, Article

Suggestions

OpenMETU
Core

Improving reinforcement learning by using sequence trees Girgin, Sertan; Polat, Faruk; Alhajj, Reda (Springer Science and Business Media LLC, 2010-12-01) This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure during the learning process. The constructed tree facilitates the process of identifying frequently used action sequences together with states that are visit...
Undesirable effects of output normalization in multiple classifier systems Altincay, H; Demirekler, Mübeccel (Elsevier BV, 2003-06-01) Incomparability of the classifier output scores is a major problem in the combination of different classification systems. In order to deal with this problem, the measurement level classifier outputs are generally normalized. However, empirical results have shown that output normalization may lead to some undesirable effects. This paper presents analyses for some most frequently used normalization methods and it is shown that the main reason for these undesirable effects of output normalization is the dimen...
Nuclear Fission-Nuclear Fusion algorithm for global optimization: a modified Big Bang-Big Crunch algorithm YALÇIN, YAĞIZER; Pekcan, Onur (Springer Science and Business Media LLC, 2020-04-01) This study introduces a derivative of the well-known optimization algorithm, Big Bang-Big Crunch (BB-BC), named Nuclear Fission-Nuclear Fusion-based BB-BC, simply referred to as N2F. Broadly preferred in the engineering optimization community, BB-BC provides accurate solutions with reasonably fast convergence rates for many engineering problems. Regardless, the algorithm often suffers from stagnation issues. More specifically, for some problems, BB-BC either converges prematurely or exploits the promising r...
Designing energy-efficient high-precision multi-pass turning processes via robust optimization and artificial intelligence Khalilpourazari, Soheyl; Khalilpourazary, Saman; ÇİFTÇİOĞLU, AYBİKE ÖZYÜKSEL; Weber, Gerhard Wilhelm (Springer Science and Business Media LLC, 2020-09-01) This paper suggests a novel robust formulation designed for optimizing the parameters of the turning process in an uncertain environment for the first time. The aim is to achieve the lowest energy consumption and highest precision. With this aim, the current paper considers uncertain parameters, objective functions, and constraints in the offered mathematical model. We proposed several uncertain models and validated the results in real-world case studies. In addition, several artificial intelligence-based s...
Multi-objective multi-item fixed-charge solid transportation problem under twofold uncertainty Roy, Sankar Kumar; Midya, Sudipta; Weber, Gerhard Wilhelm (Springer Science and Business Media LLC, 2019-12-01) In this paper, we investigate a multi-objective multi-item fixed-charge solid transportation problem (MOMIFCSTP) with fuzzy-rough variables as coefficients of the objective functions and of the constraints. The main focus of the paper is to analyze MOMIFCSTP under a fuzzy-rough environment for a transporting system. In practical situations, the parameters of a MOMIFCSTP are imprecise in nature, due to several uncontrollable factors. For these reasons, we introduce the fuzzy-rough variables in MOMIFCSTP to t...

Citation Formats

Z. Volkovich, Z. Barzily, G. W. Weber, D. Toledano-Kitai, and R. Avros, “Resampling approach for cluster model selection,” MACHINE LEARNING, pp. 209–248, 2011, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/57879.