Financial named entity recognition for Turkish news texts

Download
2022-7-26
Dinç, Duygu
Named Entity Recognition (NER) is a problem of information extraction where the objective is; in a given text, to detect and label named entities (NE) according to predetermined categories correctly. An NE may be a noun or a group of nouns which correspond to the name of a specific object, location or a concept in case of domain-specific applications. In the literature, person, organization, location names or date,time, money, percentage expressions are among highly studied, generic NEs. Besides, there are domain-specific studies with NEs that are related to specific do- mains like genetics, medicine, chemistry and finance. Solutions for NER problems may be useful in many downstream tasks in the Natural Language Processing do- main such as Text Summarization, Question Answering and Sentiment Analysis. For Turkish, which has pretty complex morphological features, there are less number of studies in NER field compared to more widely used languages like English. In recent years, neural-network based methods performed better in NER tasks than clas- sical rule-based or traditional machine learning techniques. In this thesis, most pop- ular deep-learning based models were experimented using different Turkish datasets. vMoreover, as being one of the focuses of this thesis, from raw financial news texts, two newly annotated datasets were presented and used throughout the experiments. New datasets were annotated using both BIO schema and raw labels, inter-annotator agreements were measured and models were trained separately using both versions to observe the effect of annotation format on performance. Moreover, new NEs specific to finance were also presented. Lastly, experiments with a few selected deep-learning based language-specific BERT models for some languages in Ural-Altaic language group were conducted.

Suggestions

Named Entity Recognition in Turkish with Bayesian Learning and Hybrid Approaches
RehaYavuz, Sermet; Kucuk, Dilek; Yazıcı, Adnan (2013-10-29)
Named entity recognition is one of the significant textual information extraction tasks. In this paper, we present two approaches for named entity recognition on Turkish texts. The first is a Bayesian learning approach which is trained on a considerably limited training set. The second approach comprises two hybrid systems based on joint utilization of this Bayesian learning approach and a previously proposed rule-based named entity recognizer. All of the proposed three approaches achieve promising performa...
Person name recognition in turkish financial texts by using local grammar approach
Bayraktar, Özkan; Taşkaya Temizel, Tuğba; Department of Information Systems (2007)
Named entity recognition (NER) is the task of identifying the named entities (NEs) in the texts and classifying them into semantic categories such as person, organization, and place names and time, date, monetary, and percent expressions. NER has two principal aims: identification of NEs and classification of them into semantic categories. The local grammar (LG) approach has recently been shown to be superior to other NER techniques such as the probabilistic approach, the symbolic approach, and the hybrid a...
A hybrid named entity recognizer for Turkish
Kucuk, Dilek; Yazıcı, Adnan (2012-02-15)
Named entity recognition is an important subfield of the broader research area of information extraction from textual data. Yet, named entity recognition research conducted on Turkish texts is still rare as compared to related research carried out on other languages such as English, Spanish, Chinese, and Japanese. In this study, we present a hybrid named entity recognizer for Turkish, which is based on a manually engineered rule based recognizer that we have proposed. Since rule based systems for specific d...
The CHEMDNER corpus of chemicals and drugs and its annotation principles
Krallinger, Martin; et. al. (2015-01-19)
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 ...
Named Entity Recognition with Conditional Random Fields on Turkish News Dataset: Revisiting the Features
Çekinel, Recep Fırat; Karagöz, Pınar (2019-04-24)
Named entity recognition is a natural language processing problem that aims to mark entity names, such as person, place, organization, date, time, money and percentage, from different types of text. Various applications such as location estimation, event time estimation, determination of important people in the text can be possible with the solutions to this problem. The number of named entity recognition studies on Turkish texts is quite limited compared to those on English. In this study, the use of the t...
Citation Formats
D. Dinç, “Financial named entity recognition for Turkish news texts,” M.S. - Master of Science, Middle East Technical University, 2022.