Clustering scientific literature using sparse citation graph analysis

2006-01-01
Bolelli, Levent
Ertekin Bolelli, Şeyda
Giles, C. Lee
It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS

Suggestions

Comparison of multidimensional data access methods for feature-based image retrieval
Arslan, Serdar; Saçan, Ahmet; Açar, Esra; Toroslu, İsmail Hakkı; Yazıcı, Adnan (2010-11-18)
Within the scope of information retrieval, efficient similarity search in large document or multimedia collections is a critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy tradeoffs for each of these methods are demonstrated on a large Corel image database. Similarity of images is obtained via a featurebased similarity...
Clustering of manifold-modeled data based on tangent space variations
Gökdoğan, Gökhan; Vural, Elif; Department of Electrical and Electronics Engineering (2017)
An important research topic of the recent years has been to understand and analyze data collections for clustering and classification applications. In many data analysis problems, the data sets at hand have an intrinsically low-dimensional structure and admit a manifold model. Most state-of-the-art clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate diff...
On Fuzzy Extensions to Energy Ontologies for Text Processing Applications
Kucuk, Dilek; Kucuk, Dogan; Yazıcı, Adnan (2014-10-28)
Ubiquitous application areas of domain ontologies include text processing applications like categorizing related documents of the domain, extraction of information from these documents, and semantic search. In this paper, we focus on the utilization of two energy ontologies, one for electrical power quality and the second for wind energy, within such applications. For this purpose, we present fuzzy extensions to these domain ontologies as fuzziness is an essential feature of the ultimate forms of the ontolo...
Indexing Fuzzy Spatiotemporal Data for Efficient Querying: A Meteorological Application
Sozer, Aziz; Yazıcı, Adnan; Oğuztüzün, Mehmet Halit S. (2015-10-01)
Spatiotemporal data, in particular fuzzy and complex spatial objects representing geographic entities and relations, is a topic of great importance in geographic information systems and environmental data management systems. For database researchers, modeling and designing a database of fuzzy spatiotemporal data and querying such a database efficiently have been challenging issues due to complex spatial features and uncertainty involved. This paper presents an integrated approach to modeling, indexing, and ...
Flexible Content Extraction and Querying for Videos
Demir, Utku; KOYUNCU, Murat; Yazıcı, Adnan; Yilmaz, Turgay; SERT, MUSTAFA (2011-10-28)
In this study, a multimedia database system which includes a semantic content extractor, a high-dimensional index structure and an intelligent fuzzy object-oriented database component is proposed. The proposed system is realized by following a component-oriented approach. It supports different flexible query capabilities for the requirements of video users, which is the main focus of this paper. The query performance of the system (including automatic semantic content extraction) is tested and analyzed in t...
Citation Formats
L. Bolelli, Ş. Ertekin Bolelli, and C. L. Giles, “Clustering scientific literature using sparse citation graph analysis,” KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, pp. 30–41, 2006, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/54906.