Show/Hide Menu
Hide/Show Apps
anonymousUser
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Frequently Asked Questions
Frequently Asked Questions
Communities & Collections
Communities & Collections
Identifying textual personal information with artificial neural networks
Download
index.pdf
Date
2019
Author
Demir, Memduh Çağrı
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
4
views
0
downloads
Solutions to many natural language processing problems need language-specific labeled data to be learned. However, both the endeavor of compiling a new dataset in a new language and the practice of translating an existing dataset to another language require human expert effort which can not be automated. To learn a solution in a new target language in an automated manner without any extra data, we focus on the known problem of dialogue act classification and propose two solutions that combine existing dialogue act classification methods with machine translation techniques. We implement the proposed solutions Localized Dialogue Act Classifier (LDAC) and Universal Dialogue Act Classifier (UDAC) using two different dialogue act classification methods, and a state-of-the-art machine translation method. We test both solutions on two datasets that are frequently used in testing a dialogue act classification method, namely Switchboard Dialogue Act (SwDA) and Meeting Recorder Dialogue Act (MRDA) datasets, and use German, Spanish and Turkish as the target languages. The results show that the models trained on translated datasets perform worse than their monolingual counterparts, trained on a dataset in its original language. Nonetheless, the results also indicate that acceptably accurate dialogue act classification is achieved on new target languages by LDAC, without having a dataset in that language. These results show that the automated dataset translation idea we propose deserves further exploration.
Subject Keywords
Neural networks (Computer science).
,
Keywords: De-identification of plain texts
,
text classification
,
local contexts of words
,
long short term memory networks.
URI
http://etd.lib.metu.edu.tr/upload/12623666/index.pdf
https://hdl.handle.net/11511/44123
Collections
Graduate School of Natural and Applied Sciences, Thesis