Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
An Efficient Part-of-Speech Tagger for Arabic
Date
2011-02-26
Author
Kopru, Selcuk
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
120
views
0
downloads
Cite This
In this paper, we present an efficient part-of-speech (POS) tagger for Arabic which is based on a Hidden Markow Model. We explore different enhancements to improve the baseline system. Despite the morphological complexity of Arabic our approach is a data driven approach and does not utilize any morphological analyzer or a lexicon as many other Arabic PUS taggers. This makes our approach simple, very efficient and valuable to be used in real-life applications and the obtained accuracy results are still comparable to other Arabic POS taggers. In the experiments, we also thoroughly investigate different aspects of Arabic PUS tagging including tag sets, prefix and suffix analyses which were not examined in detail before. Our part-of-speech tagger achieves an accuracy of 95.57% on a standard tagset for Arabic. A detailed error analysis is provided for a better evaluation of the system. We also applied the same approach on different languages like Farsi and German to show the language independent aspect of the approach. Accuracy rates on these languages are also provided.
Subject Keywords
Hide Markov model
,
Natural language processing
,
Training corpus
,
Statistical machine translation
,
Computational linguistics
URI
https://hdl.handle.net/11511/64364
Collections
Unclassified, Conference / Seminar
Suggestions
OpenMETU
Core
A heuristic algorithm for optical character recognition of Arabic script
Yarman Vural, Fatoş T.; Atici, A. Alper (1996-03-20)
In this paper, a heuristic method is developed for segmentation, feature extraction and recognition of the Arabic script. The study is part of a large project for the transcription of the documents in Ottoman Archives. A geometrical and topological feature analysis method is developed for segmentation and feature extraction stages. Chain code transformation is applied to main strokes of the characters which are then classified by the hidden Markov model (HMM) in the recognition stage. Experimental results i...
A heuristic algorithm for optical character recognition of Arabic script
Atici, A. Alper; Yarman Vural, Fatoş T. (1997-10-01)
In this paper, a heuristic method is developed for segmentation, feature extraction and recognition of the Arabic script. The study is part of a large project for transcription of the documents in Ottoman Archives. A geometrical and topological feature analysis method is developed for segmentation and feature extraction stages. Chain code transformation is applied to main strokes of the characters, which are classified by the hidden Markov model (HMM) in the recognition stage. Experimental results indicate ...
A character recognizer for Turkish language
Korkmaz, SU; Akinci, GKY; Atalay, Mehmet Volkan (2003-01-01)
This paper presents particularly a contextual post processing subsystem for a Turkish machine printed character recognition system. The contextual post processing subsystem is based on positional binary 3-gram statistics for Turkish language, an error corrector parser and a lexicon, which contains root words and the inflected forms of the root words. Error corrector parser is used for correcting CR alternatives using Turkish Morphology.
A framework for sentiment analysis in Turkish application to polarity detection of movie reviews in Turkish
Vural, Gural; Cambazoglu, Barla; Tokgoz, Özge Zehra; Karagöz, Pınar (null; 2012-10-28)
In this work, we present a framework for unsupervised sentiment analysis in Turkish text documents. As part of our framework, we customize the SentiStrength sentiment analysis library by translating its lexicon to Turkish. We apply our framework to the problem of classifying the polarity of movie reviews. For performance evaluation, we use a large corpus of Turkish movie reviews obtained from a popular Turkish social media site. Although our framework is unsupervised, it is demonstrated to achieve a fairly ...
Pronominal anaphora resolution in Turkish and English
Ertan, Melek; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2023-1-27)
This research analyzes pronominal anaphora in a Turkish and English translated TED corpus, namely the TED-MDB (Zeyrek et al., 2020) and presents a heuristic-based resolution algorithm for resolving pronominal anaphora in these languages separately. The corpus has characteristics of spoken language and has 364 English sentences aligned with their Turkish counterparts. The research is divided into two stages. In the first stage, the data was annotated using a web-based annotation tool INcePTION (Klie et al., ...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
S. Kopru, “An Efficient Part-of-Speech Tagger for Arabic,” 2011, vol. 6608, p. 202, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/64364.