An application of the minimal spanning tree approach to the cluster stability problem

2012-03-01
Volkovich, Z.
Barzily, Z.
Weber, Gerhard Wilhelm
Toledano-Kitai, D.
Avros, R.
Among the areas of data and text mining which are employed today in OR, science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. An important component of clustering theory is determination of the true number of clusters. This problem has not been satisfactorily solved. In our paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters, we estimate the stability of the partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured by the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis of well mingled samples, within the clusters, leads to an asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster, corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described distribution and estimates its left-asymmetry. Several presented numerical experiments demonstrate the ability of the approach to detect the true number of clusters.
CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH

Suggestions

A survey on OR and mathematical methods applied on gene-environment networks
Weber, Gerhard Wilhelm; Kropat, Erik; Öztürk, Başak; Gorgulu, Zafer-Korcan (Springer Science and Business Media LLC, 2009-09-01)
In this paper, we survey the recent advances and mathematical foundations of gene-environment networks. We explain their interdisciplinary implications with special regard to human and life sciences as well as financial sciences. Special attention is paid to applications in Operational Research and environmental protection. Originally developed in the context of modeling and prediction of gene-expression patterns, gene-environment networks have proved to provide a conceptual framework for the modeling of dy...
A progressive approach for processing satellite data by operational research
KUTER, SEMİH; Weber, Gerhard-Wilhelm; Akyürek, Sevda Zuhal (Springer Science and Business Media LLC, 2017-07-01)
Satellite data, together with spatial technologies, have a vital importance in earth sciences to continuously monitor natural and physical processes. However, images taken by earth-observing satellites are often associated with uncertainties due to atmospheric effects (i.e., absorption and scattering by atmospheric gases and aerosols). In this paper, a more adaptable approach for the removal of atmospheric effects from satellite data is introduced within an operational research perspective by utilizing nonp...
The interval Shapley value: an axiomatization
Gok, S. Z. Alparslan; Branzei, R.; Tijs, S. (Springer Science and Business Media LLC, 2010-06-01)
The Shapley value, one of the most widespread concepts in operations Research applications of cooperative game theory, was defined and axiomatically characterized in different game-theoretic models. Recently much research work has been done in order to extend OR models and methods, in particular cooperative game theory, for situations with interval data. This paper focuses on the Shapley value for cooperative games where the set of players is finite and the coalition values are compact intervals of real num...
A multicriteria sorting approach based on data envelopment analysis for R&D project selection problem
Karasakal, Esra (Elsevier BV, 2017-12-01)
In this paper, multiple criteria sorting methods based on data envelopment analysis (DEA) are developed to evaluate research and development (R&D) projects. The weight intervals of the criteria are obtained from Interval Analytic Hierarchy Process and employed as the assurance region constraints of models. Based on data envelopment analysis, two threshold estimation models, and five assignment models are developed for sorting. In addition to sorting, these models also provide ranking of the projects. The de...
Mathematical contributions to dynamics and optimization of gene-environment networks
Weber, Gerhard Wilhelm; Tezel, Aysun; TAYLAN, PAKİZE; Soyler, Alper; Cetin, Mehmet (Informa UK Limited, 2008-01-01)
This article contributes to a further introduction of continuous optimization in the field of computational biology which is one of the most challenging and emerging areas of science, in addition to foundations presented and the state-of-the-art displayed in [C.A. Floudas and P.M. Pardalos, eds., Optimization in Computational Chemistry and Molecular Biology: Local and Global Approaches, Kluwer Academic Publishers, Boston, 2000]. Based on a summary of earlier works by the coauthors and their colleagues, it r...
Citation Formats
Z. Volkovich, Z. Barzily, G. W. Weber, D. Toledano-Kitai, and R. Avros, “An application of the minimal spanning tree approach to the cluster stability problem,” CENTRAL EUROPEAN JOURNAL OF OPERATIONS RESEARCH, pp. 119–139, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/49779.