A Survey of Constrained Clustering

2016-04-01
Dinler, Derya
Tural, Mustafa Kemal
Traditional data mining methods for clustering only use unlabeled data objects as input. The aim of such methods is to find a partition of these unlabeled data objects in order to discover the underlying structure of the data. In some cases, there may be some prior knowledge about the data in the form of (a few number of) labels or constraints. Performing traditional clustering methods by ignoring the prior knowledge may result in extracting irrelevant information for the user. Constrained clustering, i.e., clustering with side information or semi-supervised clustering, addresses this problem by incorporating prior knowledge into the clustering process to discover relevant information from the data. In this chapter, a survey of advances in the area of constrained clustering will be presented. Different types of prior knowledge considered in the literature, and clustering approaches that make use of this prior knowledge will be reviewed.

Suggestions

A Study of the Classification of Low-Dimensional Data with Supervised Manifold Learning
Vural, Elif (2018-01-01)
Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of supervised manifold learning for classification. We consider nonlinear dimensionality reduction algorithms that yield linearly separable embeddings of training data and present generalization bounds for this type of algorithms. A necessary condition for satisfactory generalizat...
A Methodology to develop process ontology from organizational guidelines written in natural language
Gürbüz, Özge; Demirörs, Onur; Department of Information Systems (2017)
Integrating ontologies with process modeling improves data representations and makes it easier to query, store and reuse processes at the semantics level. Therefore, in recent years, this topic has become increasingly popular. The studies in the literature have proposed methods for the integration process either to relate domain ontologies to process models or to transform process models to process ontologies. Another way to establish the integration between ontologies and process models is to develop proce...
A Multi-objective approach to cluster ensemble selection problem
Aktaş, Dilay; Lokman, Banu; Department of Operational Research (2019)
Clustering is an unsupervised learning method that partitions a data set into groups. The aim is to assign similar points to the same cluster and dissimilar points to different clusters with respect to some notion of similarity. It is applicable to a wide range of areas such as recommender systems, anomaly detection, market research, and customer segmentation. With the advances in the computational power, a diverse set of clustering solutions can be obtained from a dataset using different clustering algorit...
A survey on multidimensional persistence theory
Karagüler, Dilan; Pamuk, Semra; Department of Mathematics (2021-8)
Persistence homology is one of the commonly used theoretical methods in topological data analysis to extract information from given data using algebraic topology. Converting data to a filtered object and analyzing the topological features of each space in the filtration, we will obtain a way of representing these features called the shape of data. This will give us invariants like barcodes or persistence diagrams for the data. These invariants are stable under small perturbations. In most applications, we n...
A clustering method for web data with multi-type interrelated components
Bolelli, Levent; Ertekin Bolelli, Şeyda; Zhou, Ding; Giles, C Lee (2007-05-08)
Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on author...
Citation Formats
D. Dinler and M. K. Tural, A Survey of Constrained Clustering. 2016, p. 235.