Unsupervised Morphological Segmentation Using Neural Word Embeddings

2016-10-12
Ustun, Ahmet
CAN BUĞLALILAR, BURCU
We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.

Suggestions

A Trie-structured Bayesian Model for Unsupervised Morphological Segmentation
Kurfalı, Murathan; Ustun, Ahmet; CAN BUĞLALILAR, BURCU (2017-04-23)
In this paper, we introduce a trie-structured Bayesian model for unsupervised morphological segmentation. We adopt prior information from different sources in the model. We use neural word embeddings to discover words that are morphologically derived from each other and thereby that are semantically similar. We use letter successor variety counts obtained from tries that are built by neural word embeddings. Our results show that using different information sources such as neural word embeddings and letter s...
Unsupervised Deep Learning for Subspace Clustering
SEKMEN, ali; Koku, Ahmet Buğra; PARLAKTUNA, Mustafa; ABDULMALEK, Ayad; VANAMALA, Nagendrababu (2017-12-14)
This paper presents a novel technique for the segmentation of data W = [w(1) . . . w(N)] subset of R-D drawn from a union U = boolean OR(M)(i=1) S-i of subspaces {S-i}(i=1)(M). First, an existing subspace segmentation algorithm is used to perform an initial data clustering {C-i}(i=1)(M), where C-i = {w(i1) . . . w(ik)} subset of W is the set of data from the ith cluster. Then, a local subspace LSi is matched for each C-i and the distance d(ij) between LSi and each point w(ij) is an element of C-i is compute...
Unsupervised segmentation of hyperspectral images using modified phase correlation
Ertuerk, Alp; Ertuerk, Sarp (2006-10-01)
This letter presents hyperspectral image segmentation based on the phase-correlation measure of subsampled hyperspectral data, which is referred to as modified phase correlation. The hyperspectral spectrum of each pixel is initially subsampled to gain, robustness against noise and spatial variability, and phase correlation is applied to determine spectral similarity. Similar and dissimilar pixels are decided according to the peak value of the phase correlation result to determine pixels that fall into the s...
Building Morphological Chains for Agglutinative Languages
Ozen, Serkan; CAN BUĞLALILAR, BURCU (2017-04-23)
In this paper, we build morphological chains for agglutinative languages by using a log linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains [1]. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The re...
Unsupervised Electromagnetic Target Classification by Self-organizing Map Type Clustering
Katilmis, T. T.; Ekmekci, E.; Sayan, Gönül (2010-07-08)
In this study, design of a completely unsupervised electromagnetic target classifier will be described based on the use of Self-Organizing Map (SOM) type artificial neural network training and Wigner distribution (WD) based target feature extraction technique. The suggested classification method will be demonstrated for a target library of four dielectric spheres which have exactly the same size but slightly different permittivity values.
Citation Formats
A. Ustun and B. CAN BUĞLALILAR, “Unsupervised Morphological Segmentation Using Neural Word Embeddings,” 2016, vol. 9918, p. 43, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/66056.