A clustering method for web data with multi-type interrelated components

2007-05-08
Bolelli, Levent
Ertekin Bolelli, Şeyda
Zhou, Ding
Giles, C Lee
Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on authorship analysis of scientific publications show that K-SVMeans achieves better clustering performance than homogeneous data clustering.

Suggestions

A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data
Karaman, İsmail Hakkı; Köksal, Gülser; Erişkin, Levent; Department of Industrial Engineering (2022-9-1)
In the real world, while the amount of data increases, it is not easy to find labeled data for Machine Learning projects, because of the compelling cost and effort requirements for labeling data. Also, most Machine Learning projects, especially multi-label classification problems, struggle with the data imbalance problem. In these problems, some classes, even, do not have enough data to train a classifier. In this study, an over sampling method for multi-label text classification problems is developed and s...
A methodology of swarm intelligence application in clustering based on neighborhood construction
İnkaya, Tülin; Kayalıgil, Sinan; Özdemirel, Nur Evin; Department of Industrial Engineering (2011)
In this dissertation, we consider the clustering problem in data sets with unknown number of clusters having arbitrary shapes, intracluster and intercluster density variations. We introduce a clustering methodology which is composed of three methods that ensures extraction of local density and connectivity properties, data set reduction, and clustering. The first method constructs a unique neighborhood for each data point using the connectivity and density relations among the points based upon the graph the...
A new anisotropic perfectly matched layer medium for mesh truncation in finite difference time domain analysis
Tong, MS; Chen, YC; Kuzuoğlu, Mustafa; Mittra, R (1999-09-01)
In this paper an unsplit anisotropic perfectly matched layer (PML) medium, previously utilized in the context of finite element analysis, is implemented in the finite difference time domain (FDTD) algorithm. The FDTD anisotropic PML is easy to implement in the existing FDTD codes, and is well suited for truncating inhomogeneous and layered media without special treatment required in the conventional PML approach. A further advantage of the present approach is improved performance at lower frequencies. The a...
A Probabilistic approach to sparse multi scale phase based stereo
ULUSOY PARNAS, İLKAY; Halıcı, Uğur; HANCOCK, EDWIN (2004-04-30)
In this study, a multi-scale phase based sparse disparity algorithm and a probabilistic model for matching are proposed. The disparity algorithm and the probabilistic approach are verified on various stereo image pairs.
A Proposed Methodology for Evaluating HDR False Color Maps
Akyüz, Ahmet Oğuz (Association for Computing Machinery (ACM), 2016-08-01)
Color mapping, which involves assigning colors to the individual elements of an underlying data distribution, is a commonly used method for data visualization. Although color maps are used in many disciplines and for a variety of tasks, in this study we focus on its usage for visualizing luminance maps. Specifically, we ask ourselves the question of how to best visualize a luminance distribution encoded in a high-dynamic-range (HDR) image using false colors such that the resulting visualization is the most ...
Citation Formats
L. Bolelli, Ş. Ertekin Bolelli, D. Zhou, and C. L. Giles, “A clustering method for web data with multi-type interrelated components,” 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/69643.