A new clustering scheme and its use in an information retrieval system incorporating the support of a database machine

Can, Fazlı
The need for immediate and accurate access to the current literature at one side and the information explosion on the other side have caused the development of information retrieval systems. in this work, information retrieval problem is studied and new concepts and methodologies are proposed for its solution. The new proposals are cover coefficient and cluster seed power concepts and the methodologies for estimating the number of clusters within a collection and the number of members within a cluster. These concepts and methodologies are used in a new single-pass clustering algorithm. A multi-pass clustering algorithm is introduced to show the validity of the cover coefficient concept for clustering purposes. in the thesis, the complexity analysis of the algorithms, a new centroid generation policy in connection with the new cover coefficient concept are presented. An algorithm for the maintenance of the clusters in expanding document collection environments and its complexity analysis are also presented. The similarity and stability concepts for clustering algorithms are introduced, then the clustering algorithms are analyzed by a set of experiments with respect to these concepts. For the purpose of the experiments, a document collection of 167 articles from the ACM-TODS publications has been constructed. The characteristics of the collection, the findings of the experiments and some observed basic relationships are illustrated in detail. In the thesis, an information system model which integrates the information retrieval and database management systems is proposed. Unlike the previous studies aimed at this purpose, which more or less reduce one system into the other, the proposed model aims to accomplish this integration by a synthesis of the techniques and methodologies of both systems. For this purpose, a database machine, the Relational Associative Processor (RAP), is enhanced with the new text retrieval instructions. Context sensitive free text retrieval operations are implemented by using the new instructions. in the model, a clustering subsystem and a conceptual data model are used for information retrieval purposes. The performance of the database machine in text retrieval operations and a comparative performance evaluation of the single-pass and the multi-pass clustering algorithms in information retrieval are presented. Additional concepts/methodologies that utilize cover coefficient concepts are also introduced in the thesis.


An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification
Altincay, H; Demirekler, Mübeccel (2000-04-01)
In this paper, we describe a relation between classification systems and information transmission systems. By looking at the classification systems from this perspective, we propose a method of classifier weight estimation for the linear (LIN-OP) and logarithmic opinion pool (LOG-OP) type classifier combination schemes for which some tools from information theory are used. These weights provide contextual information about the classifiers such as class dependent classifier reliability and global classifier ...
UNAL, E; BASBUGOGLU, O (1994-04-14)
A tool for visualizing the communication events in a parallel processing system consisting of transputers is presented. A user program written in OCCAM language is preprocessed by this tool to add run-time event timing mechanisms. Time-stamping data thus obtained is then displayed by a second tool in a graphical format which accurately illustrates any channel or link communication occurring during the execution of the user program. This knowledge enables the user to gain a wider insight in the improvement o...
A progressive approach for processing satellite data by operational research
KUTER, SEMİH; Weber, Gerhard-Wilhelm; Akyürek, Sevda Zuhal (Springer Science and Business Media LLC, 2017-07-01)
Satellite data, together with spatial technologies, have a vital importance in earth sciences to continuously monitor natural and physical processes. However, images taken by earth-observing satellites are often associated with uncertainties due to atmospheric effects (i.e., absorption and scattering by atmospheric gases and aerosols). In this paper, a more adaptable approach for the removal of atmospheric effects from satellite data is introduced within an operational research perspective by utilizing nonp...
The Use and acceptance of information and communication technologies by senior citizens: a technology acceptance model (TAM) for Turkish population
Güner, Hacer; Acartürk, Cengiz; Department of Information Systems (2017)
To become an information society, it is required that the citizens have access Information and Communication technologies (ICT) in appropriate ways. ICT plays a major role to improve inclusion of various parts of the society into daily life, such as elderly citizens. As in neighbor countries in the EU and in the Middle East, the population of Turkey is getting older, according to the Turkish Statistical Institute (TurkStat, 2016). This urges the need for a systematic investigation of ICT needs of elderly ci...
Using tag similarity in SVD-based recommendation systems
Osmanli, Osman Nuri; Toroslu, İsmail Hakkı (2011-12-01)
Data analysis has become a very important area for both companies and researchers as a consequence of the technological developments in recent years. Companies are trying to increase their profit by analyzing the existing data about their customers and making decisions for the future according to the results of these analyses. Parallel to the need of companies, researchers are investigating different methodologies to analyze data more accurately with high performance. In this paper, we adopted free-formatte...
Citation Formats
F. Can, “A new clustering scheme and its use in an information retrieval system incorporating the support of a database machine,” Ph.D. - Doctoral Program, Middle East Technical University, 1984.