Selective word encoding for effective text representation

2019-01-01
Ozkan, Savas
Ozkan, Akin
Determining the category of a text document from its semantic content is highly motivated in the literature and it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve its performance. In particular, the studies which exploit the aggregation of word-level representations are the mainstream techniques used in the problem. In this paper, we tackle text representation to achieve high performance in different text classification tasks. Throughout the paper, three critical contributions are presented. First, to encode the wordlevel representations for each text, we adapt a trainable orderless aggregation algorithm to obtain a more discriminative abstract representation by transforming word vectors to the text-level representation. Second, we propose an effective term-weighting scheme to compute the relative importance of words from the context based on their conjunction with the problem in an end-to-end learning manner. Third, we present a weighted loss function to mitigate the class-imbalance problem between the categories. To evaluate the performance, we collect two distinct datasets as Turkish parliament records (i.e. written speeches of four major political parties including 30731/7683 train and test documents) and newspaper articles (i.e. daily articles of the columnists including 16000/3200 train and test documents) whose data is available on the web. From the results, the proposed method introduces significant performance improvements to the baseline techniques (i.e. VLAD and Fisher Vector) and achieves 0.823% and 0.878% true prediction accuracies for the party membership and the estimation of the category of articles respectively. The performance validates that the proposed contributions (i.e. trainable word-encoding model, trainable term-weighting scheme and weighted loss function) significantly outperform the baselines.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES

Suggestions

Domain adaptation on graphs by learning graph topologies: theoretical analysis and an algorithm
Vural, Elif (The Scientific and Technological Research Council of Turkey, 2019-01-01)
Traditional machine learning algorithms assume that the training and test data have the same distribution, while this assumption does not necessarily hold in real applications. Domain adaptation methods take into account the deviations in data distribution. In this work, we study the problem of domain adaptation on graphs. We consider a source graph and a target graph constructed with samples drawn from data manifolds. We study the problem of estimating the unknown class labels on the target graph using the...
Multipath Characteristics of Frequency Diverse Arrays Over a Ground Plane
Cetintepe, Cagri; Demir, Şimşek (Institute of Electrical and Electronics Engineers (IEEE), 2014-07-01)
This paper presents a theoretical framework for an analytical investigation of multipath characteristics of frequency diverse arrays (FDAs), a task which is attempted for the first time in the open literature. In particular, transmitted field expressions are formulated for an FDA over a perfectly conducting ground plane first in a general analytical form, and these expressions are later simplified under reasonable assumptions. Developed formulation is then applied to a uniform, linear, continuous-wave opera...
Recursive shortest spaning tree algorithms for image segmentation
Bayramoğlu, Neslihan Yalçın; Bazlamaçcı, Cüneyt Fehmi; Department of Electrical and Electronics Engineering (2005)
Image segmentation has an important role in image processing because it is a tool to obtain higher level object descriptions for further processing. In some applications such as large image databases or video image sequence segmentations, the speed of the segmentation algorithm may become a drawback of the application. This thesis work is a study to improve the run-time performance of a well-known segmentation algorithm, namely the Recursive Shortest Spanning Tree (RSST). Both the original and the fast RSST...
State-space identification of switching linear discrete time-periodic systems with known scheduling signals
Uyanik, Ismail; Hamzacebi, Hasan; Ankaralı, Mustafa Mert (The Scientific and Technological Research Council of Turkey, 2019-01-01)
In this paper, we propose a novel frequency domain state-space identification method for switching linear discrete time-periodic (LDTP) systems with known scheduling signals. The state-space identification problem of linear time-invariant (LTI) systems has been widely studied both in the time and frequency domains. Indeed, there have been several studies that also concentrated on state-space identification of both continuous and discrete linear time-periodic (LTP) systems. The focus in this study is the fam...
On-demand conversation customization for services in large smart environments
Elgedawy, I. (IBM, 2011-01-01)
Services in large smart environments, as defined in this paper, are "aware" of their users' contexts and goals and are able to automatically interact with one another in order to achieve these goals. Unfortunately, interactions between services (i.e., service conversations) are not necessarily compatible, as services could have different interfaces (i.e., signature incompatibilities), as well as different logic for message ordering (i.e., protocol incompatibilities). Such conversation incompatibilities crea...
Citation Formats
S. Ozkan and A. Ozkan, “Selective word encoding for effective text representation,” TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, pp. 1028–1040, 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/65216.