Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Exploiting Index Pruning Methods for Clustering XML Collections
Download
index.pdf
Date
2010-01-01
Author
Altıngövde, İsmail Sengör
Ulusoy, Ozgur
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
165
views
85
downloads
Cite This
In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3 M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.
Subject Keywords
Cover-coefficient based clustering
,
Index pruning
,
XML
URI
https://hdl.handle.net/11511/35247
DOI
https://doi.org/10.1007/978-3-642-14556-8_37
Collections
Department of Computer Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
Cluster searching strategies for collaborative recommendation systems
Altıngövde, İsmail Sengör; Ulusoy, Ozgur (2013-05-01)
In-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individua...
Similarity matrix framework for data from union of subspaces
Aldroubi, Akram; Sekmen, Ali; Koku, Ahmet Buğra; Cakmak, Ahmet Faruk (2018-09-01)
This paper presents a framework for finding similarity matrices for the segmentation of data W = [w(1)...w(N)] subset of R-D drawn from a union U = boolean OR(M)(i=1) S-i, of independent subspaces {S-i}(i=1)(M), of dimensions {d(i)}(i=1)(M). It is shown that any factorization of W = BP, where columns of B form a basis for data W and they also come from U, can be used to produce a similarity matrix Xi w. In other words, Xi w(i, j) not equal 0, when the columns w(i) and w(j) of W come from the same subspace, ...
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
Kaya, Semih; Vural, Elif (2021-01-01)
While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of th...
Consensus clustering of time series data
Yetere Kurşun, Ayça; Batmaz, İnci; İyigün, Cem; Department of Scientific Computing (2014)
In this study, we aim to develop a methodology that merges Dynamic Time Warping (DTW) and consensus clustering in a single algorithm. Mostly used time series distance measures require data to be of the same length and measure the distance between time series data mostly depends on the similarity of each coinciding data pair in time. DTW is a relatively new measure used to compare two time dependent sequences which may be out of phase or may not have the same lengths or frequencies. DTW aligns two time serie...
Optimization of Mesa Structured InGaAs Based Photodiode Arrays
Dolas, M. Halit; Çırçır, Kübra; Kocaman, Serdar (2017-04-13)
We design lattice matched InP/In0.53Ga0.47As mesa structured heterojunction p-n photodiodes with a novel passivation methodology based on a fully depleted thin p-InP layer. Mesa-structured detectors are targeted due to their competitive advantages for applications such as multicolor/hyperspectral imaging. Test detector pixels with different perimeter/area ratios are fabricated with and without etching thin InP passivation layer between pixels in order to comparatively examine passivating behavior. I-V chara...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
İ. S. Altıngövde and O. Ulusoy, “Exploiting Index Pruning Methods for Clustering XML Collections,” 2010, vol. 6203, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35247.