A binomial noised model for cluster validation

2013-01-01
Toledano-Kitai, Dvora
Avros, Renata
Volkovich, Zeev
Weber, Gerhard Wilhelm
Yahalom, Orly
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal ("true") number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthetic and real world data demonstrates the high capability of the offered discipline compared to other methods. In particular, the use of a noised data set is shown to produce significantly better results than in the case of using two independent samples which are both drawn from the data source.
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS

Suggestions

An iterative approximation scheme for repetitive Markov processes
Tüfekçi, Tolga; Güllü, Refik (Cambridge University Press (CUP), 1999-09-01)
Repetitive Markov processes form a class of processes where the generator matrix has a particular repeating form. Many queueing models fall in this category such as M/M/1 queues, quasi-birth-and-death processes, and processes with M/G/1 or GI/M/1 generator matrices. in this paper, a new iterative scheme is proposed for computing the stationary probabilities of such processes. An infinite state process is approximated by a finite state process by lumping an infinite number of states into a super-state. What ...
Classification models based on Tanaka's fuzzy linear regression approach: The case of customer satisfaction modeling
ŞİKKELİ, GİZEM; KÖKSAL, GÜLSER; Batmaz, İnci; TÜRKER BAYRAK, ÖZLEM (IOS Press, 2010-01-01)
Fuzzy linear regression (FLR) approaches are widely used for modeling relations between variables that involve human judgments, qualitative and imprecise data. Tanaka's FLR analysis is the first one developed and widely used for this purpose. However, this method is not appropriate for classification problems, because it can only handle continuous type dependent variables rather than categorical. In this study, we propose three alternative approaches for building classification models, for a customer satisf...
A complete axiomatization for fuzzy functional and multivalued dependencies in fuzzy database relations
Sozat, MI; Yazıcı, Adnan (Elsevier BV, 2001-01-15)
This paper first introduces the formal definitions of fuzzy functional and multivalued dependencies which are given on the basis of the conformance values presented here. Second, the inference rules are listed after both fuzzy functional and multivalued dependencies are shown to be consistent, that is, they reduce to those of the classic functional and multivalued dependencies when crisp attributes are involved. Finally, the inference rules presented here are shown to be sound and complete for the family of...
A GATE ARRAY CHIP FOR HIGH-FREQUENCY DSP APPLICATIONS
UNGAN, IE; ASKAR, M (1994-04-14)
A gate array architecture for high speed correlation and convolution is described. A gate array chip based on this architecture is designed and an FIR filter is implemented on thm chip. Bit-level array in pipeline structure is used in the architecture. For high 1/0 data rate, true single phase clocking circuit technique in CMOS is applied. The gate array chip is designed in 1.2pm CMOS with the programming layer metal-2 only. Spice and Verilog simulations show that the throughput is over 100 MHz
A computational approach to nonparametric regression: bootstrapping cmars method
Yazıcı, Ceyda; Batmaz, İnci; Department of Statistics (2011)
Bootstrapping is a resampling technique which treats the original data set as a population and draws samples from it with replacement. This technique is widely used, especially, in mathematically intractable problems. In this study, it is used to obtain the empirical distributions of the parameters to determine whether they are statistically significant or not in a special case of nonparametric regression, Conic Multivariate Adaptive Regression Splines (CMARS). Here, the CMARS method, which uses conic quadr...
Citation Formats
D. Toledano-Kitai, R. Avros, Z. Volkovich, G. W. Weber, and O. Yahalom, “A binomial noised model for cluster validation,” JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, pp. 417–427, 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/57925.