Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
The effect of data set characteristics on the choice of clustering validity index type
Date
2007-11-09
Author
Taşkaya Temizel, Tuğba
Inkaya, Tulin
Yucebas, Sait Can
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
212
views
0
downloads
Cite This
Clustering techniques are widely used to give insight about the similarities/dissimilarities between data set items. Most algorithms require the user to tune parameters such as number of clusters or threshold for cut-off point in a dendrogram. Such parameters also affect the clustering quality. In a good quality cluster, the intra-cluster similarity should be high, whereas the inter-cluster similarity should be low. To determine the optimal cluster number, several cluster validity methods have been proposed. However, there is no guideline with respect to which clustering validity methods can be used in conjunction with which clustering algorithms. In this paper, Dunn and SD validity indices were applied to Kohonen self organizing maps, k-means and agglomerative clustering algorithms and their limitations were shown empirically.
Subject Keywords
Educational institutions
,
Clustering algorithms
,
Cities and towns
,
Informatics
,
Self organizing feature maps
,
Frequency
,
Partitioning algorithms
,
Employment
,
Cleaning
,
Industrial engineering
URI
https://hdl.handle.net/11511/31546
DOI
https://doi.org/10.1109/iscis.2007.4456856
Conference Name
2007 22nd international symposium on computer and information sciences
Collections
Graduate School of Informatics, Conference / Seminar
Suggestions
OpenMETU
Core
The effect of software design patterns on object-oriented software quality and maintainability
Türk, Tuna; Bilgen, Semih; Department of Electrical and Electronics Engineering (2009)
This study investigates the connection between design patterns, object oriented (OO) quality metrics and software maintainability. The literature on OO metrics, design patterns and software maintainability are reviewed, the relation between OO metrics and software maintainability is investigated, and then, in terms of obtained maintainability indicator metrics, the maintainability change of an application due to usage of design patterns is observed.
The general lot sizing and scheduling problem with sequence dependent changeovers
Koçlar, Ayşe; Süral, Haldun; Department of Industrial Engineering (2005)
In this study, we consider the General Lot Sizing and Scheduling Problem in single level capacitated environments with sequence dependent item changeovers. Process industries may be regarded as suitable application areas of the problem. The focus on capacity utilization and intensively time consuming changeovers necessitate the integration of lot sizing and sequencing decisions in the production plan. We present a mathematical model which captures the essence of cases in the most generic and realistic setti...
The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory
Sahin, Alper; ANIL, DUYGU (2017-02-01)
This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of three test lengths (10, 20, and 30 items) and nine different sample sizes (150, 250, 350, 500, 750, 1,000, 2,000, 3,000 and 5,000 examinees). These data sets were the...
A memetic algorithm for clustering with cluster based feature selection
Şener, İlyas Alper; İyigün, Cem; Department of Operational Research (2022-8)
Clustering is a well known unsupervised learning method which aims to group the similar data points and separate the dissimilar ones. Data sets that are subject to clustering are mostly high dimensional and these dimensions include relevant and redundant features. Therefore, selection of related features is a significant problem to obtain successful clusters. In this study, it is considered that relevant features for each cluster can be varied as each cluster in a data set is grouped by different set of fe...
The Usage of Two Level Random Intercept Model Specifications in the Analysis of Achievement in Mathematics
Gökalp Yavuz, Fulya (2013-12-01)
Hierarchical models are highly useful tools for clustered and multilevel type of data and coefficients can vary by clusters in these models. In this study, several types of two-level random intercept model specifications are used to compare the mathematics scores of 8th grade students from three different safe and orderly levels of schools, after taking into account of variation both between classes and between students within the same class. The data obtained from Trends in International Mathematics and Sc...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
T. Taşkaya Temizel, T. Inkaya, and S. C. Yucebas, “The effect of data set characteristics on the choice of clustering validity index type,” presented at the 2007 22nd international symposium on computer and information sciences, Ankara, Türkiye, 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31546.