An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms

Download
2004
Aksoy, Ece
There is no universally applicable clustering technique in discovering the variety of structures display in data sets. Also, a single algorithm or approach is not adequate to solve every clustering problem. There are many methods available, the criteria used differ and hence different classifications may be obtained for the same data. While larger and larger amounts of data are collected and stored in databases, there is increasing the need for efficient and effective analysis methods. Grouping or classification of measurements is the key element in these data analysis procedures. There are lots of non-spatial clustering techniques in various areas. However, spatial clustering techniques and software are not so common. This thesis is an attempt to classify Turkish district data with the help of two clustering algorithms: K-means clustering and self organizing maps (SOM). With the help of these two common techniques it is expected that a clustering can be reached, which can be used for different aims such as regional politics, constructing statistical integrity or analyzing distribution of funds, for same data in GIS environment and putting forward the facilitative usage of GIS in regional and statistical studies. All districts of Turkey, which is 923 units, were chosen as an application area in this thesis. Some limitations such as population were specified for clustering of Turkey̕s districts. Firstly, different clustering techniques for spatial classification were researched. K-Means and SOM algorithms were chosen to compare different methods with Turkey̕s district data. Afterward, database of Turkey̕s statistical datum was formed and analyzed joining with geographical data in the GIS environment. Different clustering software, ArcGIS, CrimeStat and Matlab, were applied according to conclusion of clustering techniques research. Self Organizing Maps (SOM) algorithm,

Suggestions

A new hybrid multi-relational data mining technique
Toprak, Seda Dağlar; Toroslu, İ. Hakkı; Department of Computer Engineering (2005)
Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this w...
Using fuzzy Petri nets for static analysis of rule-bases
Bostan-Korpeoglu, B; Yazıcı, Adnan (2004-01-01)
We use a Fuzzy Petri Net (FPN) structure to represent knowledge and model the behavior in our intelligent object-oriented database environment, which integrates fuzzy, active and deductive rules with database objects. However, the behavior of a system can be unpredictable due to the rules triggering or untriggering each other (non-termination). Intermediate and final database states may also differ according to the order of rule executions (non-confluence). In order to foresee and solve problematic behavior...
The strong partial transitive-closure problem: Algorithms and performance evaluation
Toroslu, İsmail Hakkı (1996-08-01)
The development of efficient algorithms to process the different forms of transitive-closure (To) queries within the context of large database systems has recently attracted a large volume of research efforts. In this paper, we present two new algorithms suitable for processing one of these forms, the so called strong partially instantiated transitive closure, in which one of the query's arguments is instantiated to a set of constants and the processing of which yields a set of tuples that draw their values...
Efficient computation of strong partial transitive-closures
Toroslu, İsmail Hakkı (null; 1993-01-01)
The development of efficient algorithms to process the different forms of the transitive-closure (TC) queries within the context of large database systems has recently attracted a large volume of research efforts. In this paper, we present a new algorithm suitable for processing one of these forms, the so called strong partially-instantiated, in which one of the query's argument is instantiated to a set of constants and the processing of which yields a set of tuples that draw their values form both of the q...
A novel model-based method for feature extraction from protein sequences for classification
Sarac, Omer Sinan; Atalay, Mehmet Volkan; Atalay, Rengül (2006-01-01)
Representation of amino-acid sequences constitutes the key point in classification of proteins into functional or structural classes. The representation should contain the biologically meaningful information hidden in the primary sequence of the protein. Conserved or similar subsequences are strong indicators of functional and structural similarity. In this study we present a feature mapping that takes into account the models of the subsequences of protein sequences. An expectation-maximization algorithm al...
Citation Formats
E. Aksoy, “An attempt to classify Turkish district data : K-Means and Self-Organizing Map (SOM) algorithms,” M.S. - Master of Science, Middle East Technical University, 2004.