Show/Hide Menu
Hide/Show Apps
anonymousUser
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Videos
Videos
Thesis submission
Thesis submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Contact us
Contact us
A hybrid named entity recognizer for Turkish
Date
2012-02-15
Author
Kucuk, Dilek
Yazıcı, Adnan
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
8
views
0
downloads
Cite This
Named entity recognition is an important subfield of the broader research area of information extraction from textual data. Yet, named entity recognition research conducted on Turkish texts is still rare as compared to related research carried out on other languages such as English, Spanish, Chinese, and Japanese. In this study, we present a hybrid named entity recognizer for Turkish, which is based on a manually engineered rule based recognizer that we have proposed. Since rule based systems for specific domains require their knowledge sources to be manually revised when ported to other domains, we enrich our rule based recognizer and turn it into a hybrid recognizer so that it learns from annotated data when available and improves its knowledge sources accordingly. The hybrid recognizer is originally engineered for generic news texts, but with its learning capability, it is improved to be applicable to that of financial news texts, historical texts, and child stories as well, without human intervention. Both the hybrid recognizer and its rule based predecessor are evaluated on the same corpora and the hybrid recognizer achieves better results as compared to its predecessor. The proposed hybrid named entity recognizer is significant since it is the first hybrid recognizer proposal for Turkish addressing the above porting problem considering that Turkish possesses different structural properties compared to widely studied languages such as English and there is very limited information extraction research conducted on Turkish texts. Moreover, the employment of the proposed hybrid recognizer for semantic video indexing is shown as a case study on Turkish news videos. The genuine textual and video corpora utilized throughout the paper are compiled and annotated by the authors due to the lack of publicly available annotated corpora for information extraction research on Turkish texts.
Subject Keywords
Information extraction
,
Hybrid named entity recognizer
,
Named entity recognition in Turkish
URI
https://hdl.handle.net/11511/45101
Journal
EXPERT SYSTEMS WITH APPLICATIONS
DOI
https://doi.org/10.1016/j.eswa.2011.08.131
Collections
Department of Computer Engineering, Article
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
D. Kucuk and A. Yazıcı, “A hybrid named entity recognizer for Turkish,”
EXPERT SYSTEMS WITH APPLICATIONS
, vol. 39, no. 3, pp. 2733–2742, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/45101.