Exploiting Index Pruning Methods for Clustering XML Collections

Download
2010-01-01
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3 M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.

Suggestions

Cluster searching strategies for collaborative recommendation systems
Altıngövde, İsmail Sengör; Ulusoy, Ozgur (2013-05-01)
In-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individua...
Similarity matrix framework for data from union of subspaces
Aldroubi, Akram; Sekmen, Ali; Koku, Ahmet Buğra; Cakmak, Ahmet Faruk (2018-09-01)
This paper presents a framework for finding similarity matrices for the segmentation of data W = [w(1)...w(N)] subset of R-D drawn from a union U = boolean OR(M)(i=1) S-i, of independent subspaces {S-i}(i=1)(M), of dimensions {d(i)}(i=1)(M). It is shown that any factorization of W = BP, where columns of B form a basis for data W and they also come from U, can be used to produce a similarity matrix Xi w. In other words, Xi w(i, j) not equal 0, when the columns w(i) and w(j) of W come from the same subspace, ...
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
Kaya, Semih; Vural, Elif (2021-01-01)
While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of th...
Optimization of Mesa Structured InGaAs Based Photodiode Arrays
Dolas, M. Halit; Çırçır, Kübra; Kocaman, Serdar (2017-04-13)
We design lattice matched InP/In0.53Ga0.47As mesa structured heterojunction p-n photodiodes with a novel passivation methodology based on a fully depleted thin p-InP layer. Mesa-structured detectors are targeted due to their competitive advantages for applications such as multicolor/hyperspectral imaging. Test detector pixels with different perimeter/area ratios are fabricated with and without etching thin InP passivation layer between pixels in order to comparatively examine passivating behavior. I-V chara...
TEMPORAL CLUSTERING OF MULTIVARIATE TIME SERIES
Aslan, Sipan; Yozgatlıgil, Ceylan; İyigün, Cem; Department of Statistics (2022-2-07)
Clustering of real-valued time series is a prevalent problem that frequently emerges in various fields and applications. While clustering of univariate time series is very much examined, clustering of multivariate time series has not been extensively addressed. This dissertation considers the clustering of real-valued multivariate time series data. When the data analyzed in the clustering task are time series, the time dependencies of the time series and the clusters to be formed should be considered togeth...
Citation Formats
İ. S. Altıngövde and O. Ulusoy, “Exploiting Index Pruning Methods for Clustering XML Collections,” 2010, vol. 6203, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35247.