The effect of data set characteristics on the choice of clustering validity index type

2007-11-09
Taşkaya Temizel, Tuğba
Inkaya, Tulin
Yucebas, Sait Can
Clustering techniques are widely used to give insight about the similarities/dissimilarities between data set items. Most algorithms require the user to tune parameters such as number of clusters or threshold for cut-off point in a dendrogram. Such parameters also affect the clustering quality. In a good quality cluster, the intra-cluster similarity should be high, whereas the inter-cluster similarity should be low. To determine the optimal cluster number, several cluster validity methods have been proposed. However, there is no guideline with respect to which clustering validity methods can be used in conjunction with which clustering algorithms. In this paper, Dunn and SD validity indices were applied to Kohonen self organizing maps, k-means and agglomerative clustering algorithms and their limitations were shown empirically.
2007 22nd international symposium on computer and information sciences

Suggestions

The effect of software design patterns on object-oriented software quality and maintainability
Türk, Tuna; Bilgen, Semih; Department of Electrical and Electronics Engineering (2009)
This study investigates the connection between design patterns, object oriented (OO) quality metrics and software maintainability. The literature on OO metrics, design patterns and software maintainability are reviewed, the relation between OO metrics and software maintainability is investigated, and then, in terms of obtained maintainability indicator metrics, the maintainability change of an application due to usage of design patterns is observed.
The general lot sizing and scheduling problem with sequence dependent changeovers
Koçlar, Ayşe; Süral, Haldun; Department of Industrial Engineering (2005)
In this study, we consider the General Lot Sizing and Scheduling Problem in single level capacitated environments with sequence dependent item changeovers. Process industries may be regarded as suitable application areas of the problem. The focus on capacity utilization and intensively time consuming changeovers necessitate the integration of lot sizing and sequencing decisions in the production plan. We present a mathematical model which captures the essence of cases in the most generic and realistic setti...
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
Sahin, Alper; ANIL, DUYGU (2017-02-01)
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of three test lengths (10, 20, and 30 items) and nine different sample sizes (150, 250, 350, 500, 750, 1,000, 2,000, 3,000 and 5,000 examinees). These data sets were the...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
The Usage of Two Level Random Intercept Model Specifications in the Analysis of Achievement in Mathematics
Gökalp Yavuz, Fulya (2013-12-01)
Hierarchical models are highly useful tools for clustered and multilevel type of data and coefficients can vary by clusters in these models. In this study, several types of two-level random intercept model specifications are used to compare the mathematics scores of 8th grade students from three different safe and orderly levels of schools, after taking into account of variation both between classes and between students within the same class. The data obtained from Trends in International Mathematics and Sc...
Citation Formats
T. Taşkaya Temizel, T. Inkaya, and S. C. Yucebas, “The effect of data set characteristics on the choice of clustering validity index type,” presented at the 2007 22nd international symposium on computer and information sciences, Ankara, Türkiye, 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31546.