A hybrid named entity recognizer for Turkish

Kucuk, Dilek
Yazıcı, Adnan
Named entity recognition is an important subfield of the broader research area of information extraction from textual data. Yet, named entity recognition research conducted on Turkish texts is still rare as compared to related research carried out on other languages such as English, Spanish, Chinese, and Japanese. In this study, we present a hybrid named entity recognizer for Turkish, which is based on a manually engineered rule based recognizer that we have proposed. Since rule based systems for specific domains require their knowledge sources to be manually revised when ported to other domains, we enrich our rule based recognizer and turn it into a hybrid recognizer so that it learns from annotated data when available and improves its knowledge sources accordingly. The hybrid recognizer is originally engineered for generic news texts, but with its learning capability, it is improved to be applicable to that of financial news texts, historical texts, and child stories as well, without human intervention. Both the hybrid recognizer and its rule based predecessor are evaluated on the same corpora and the hybrid recognizer achieves better results as compared to its predecessor. The proposed hybrid named entity recognizer is significant since it is the first hybrid recognizer proposal for Turkish addressing the above porting problem considering that Turkish possesses different structural properties compared to widely studied languages such as English and there is very limited information extraction research conducted on Turkish texts. Moreover, the employment of the proposed hybrid recognizer for semantic video indexing is shown as a case study on Turkish news videos. The genuine textual and video corpora utilized throughout the paper are compiled and annotated by the authors due to the lack of publicly available annotated corpora for information extraction research on Turkish texts.


Named Entity Recognition in Turkish with Bayesian Learning and Hybrid Approaches
RehaYavuz, Sermet; Kucuk, Dilek; Yazıcı, Adnan (2013-10-29)
Named entity recognition is one of the significant textual information extraction tasks. In this paper, we present two approaches for named entity recognition on Turkish texts. The first is a Bayesian learning approach which is trained on a considerably limited training set. The second approach comprises two hybrid systems based on joint utilization of this Bayesian learning approach and a previously proposed rule-based named entity recognizer. All of the proposed three approaches achieve promising performa...
A TV content augmentation system exploiting rule based named entity recognition method
Işıklar, Yunus Emre; Çiçekli, Fehime Nihan; Department of Computer Engineering (2014)
In this thesis, a TV content augmentation system taking the advantage of named entity recognition methods is proposed. The system aims to automatically enhance TV program contents by retrieving context related data and presenting them to the viewers without any necessity of another device. In addition to conceptual description of the system, a prototype implementation is developed and demonstrated with predefined TV programs. The implementation utilizes Electronic Program Guide (EPG) data of programs crawle...
Named entity recognition experiments on Turkish texts
Küçük, Dilek; Yazıcı, Adnan (2009-10-28)
Named entity recognition (NER) is one of the main information extraction tasks and research on NER from Turkish texts is known to be rare. In this study, we present a rule-based NER system for Turkish which employs a set of lexical resources and pattern bases for the extraction of named entities including the names of people, locations, organizations together with time/date and money/percentage expressions. The domain of the system is news texts and it does not utilize important clues of capitalization and ...
The CHEMDNER corpus of chemicals and drugs and its annotation principles
Krallinger, Martin; et. al. (2015-01-19)
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 ...
Person name recognition in turkish financial texts by using local grammar approach
Bayraktar, Özkan; Taşkaya Temizel, Tuğba; Department of Information Systems (2007)
Named entity recognition (NER) is the task of identifying the named entities (NEs) in the texts and classifying them into semantic categories such as person, organization, and place names and time, date, monetary, and percent expressions. NER has two principal aims: identification of NEs and classification of them into semantic categories. The local grammar (LG) approach has recently been shown to be superior to other NER techniques such as the probabilistic approach, the symbolic approach, and the hybrid a...
Citation Formats
D. Kucuk and A. Yazıcı, “A hybrid named entity recognizer for Turkish,” EXPERT SYSTEMS WITH APPLICATIONS, pp. 2733–2742, 2012, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/45101.