K-SVMeans: A hybrid clustering algorithm for multi-type interrelated datasets

2007-01-01
Bolelli, Levent
Ertekin Bolelli, Şeyda
Zhou, Ding
Giles, C. Lee
Identification of distinct clusters of documents in text collections has traditionally been addressed by making the assumption that the data instances can only be represented by homogeneous and uniform features. Many real-world data, on the other hand, comprise of multiple types of heterogeneous interrelated components, such as web pages and hyperlinks, online scientific publications and authors and publication venues to name a few. In this paper, we present K-SVMeans, a clustering algorithm for multi-type interrelated datasets that integrates the well known K-Means clustering with the highly popular Support Vector Machines. The experimental results on authorship analysis of two real world web-based datasets show that K-SVMeans can successfully discover topical clusters of documents and achieve better clustering solutions than homogeneous data clustering.

Suggestions

K-way partitioning of signed bipartite graphs
Ömeroğlu, Nurettin Burak; Toroslu, İsmail Hakkı; Department of Computer Engineering (2012)
Clustering is the process in which data is differentiated, classified according to some criteria. As a result of partitioning process, data is grouped into clusters for specific purpose. In a social network, clustering of people is one of the most popular problems. Therefore, we mainly concentrated on finding an efficient algorithm for this problem. In our study, data is made up of two types of entities (e.g., people, groups vs. political issues, religious beliefs) and distinct from most previous works, sig...
An access structure for similarity-based fuzzy databases
Yazıcı, Adnan (Elsevier BV, 1999-04-01)
A significant effort has been made in representing imprecise information in database models by using fuzzy set theory. However, the research directed toward access structures to handle fuzzy querying effectively is still at an immature stage. Fuzzy querying involves more complex processing than the ordinary querying does. Additionally, a larger number of tuples are possibly selected by fuzzy conditions in comparison to the crisp ones. It is obvious that the need for fast response time becomes very important...
Cluster based model diagnostic for logistic regression
Tanju, Özge; Kalaylıoğlu Akyıldız, Zeynep Işıl; Department of Statistics (2016)
Model selection methods are commonly used to identify the best approximation that explains the data. Existing methods are generally based on the information theory, such as Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Consistent Akaike Information Criterion (CAIC), and Bayesian Information Criterion (BIC). These criteria do not depend on any modeling purposes. In this thesis, we propose a new method for logistic regression model selection where the modeling purpose is c...
Multisource region attention network for fine-grained object recognition in remote sensing imagery
Sümbül, Gencer; Cinbiş, Ramazan Gökberk; Aksoy, Selim (Institute of Electrical and Electronics Engineers (IEEE), 2019-07)
Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related subcategories. Multisource data analysis that aims to leverage the complementary spectral, spatial, and structural information embedded in different sources is a promising direction toward solving the fine-grained recognition problem that involves low between-class variance, small training set sizes for rare classes, and class imbalance. However, the common assumption of coregistered ...
A clustering method for web data with multi-type interrelated components
Bolelli, Levent; Ertekin Bolelli, Şeyda; Zhou, Ding; Giles, C Lee (2007-05-08)
Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on author...
Citation Formats
L. Bolelli, Ş. Ertekin Bolelli, D. Zhou, and C. L. Giles, “K-SVMeans: A hybrid clustering algorithm for multi-type interrelated datasets,” 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/45595.