Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates

2009-09-16
Tunaoglu, Doruk
Alan, Oezguer
Sabuncu, Orkunt
Akpinar, Samet
Cicekli, Nihan K.
Alpaslan, Ferda Nur
In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind of a linguistic tool. As a result, our approach can be applied to any domain or any language without the necessity of successful linguistic tools. The drawback of our approach is the time spent on crafting the templates. We also propose the means to decrease that time.

Suggestions

Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Doruk, Tunaoğlu; Alan, Özgür; Orkunt, Sabuncu; Samet, Akpınar; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach. We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently. We applied our approach to primarily Turkish unstructured soccer Web-casting texts. Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kin...
Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution
Engin, Melih; Can, Tolga (2009-09-16)
In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the ...
Cross-lingual information retrieval on Turkish and English texts
Boynueğri, Akif; Birtürk, Ayşe Nur; Department of Computer Engineering (2010)
In this thesis, cross-lingual information retrieval (CLIR) approaches are comparatively evaluated for Turkish and English texts. As a complementary study, knowledge-based methods for word sense disambiguation (WSD), which is one of the most important parts of the CLIR studies, are compared for Turkish words. Query translation and sense indexing based CLIR approaches are used in this study. In query translation approach, we use automatic and manual word sense disambiguation methods and Google translation ser...
Process based information systems evaluation: Towards the attributes of "pRISE"
Özkan Yıldırım, Sevgi; Bilgen, Semih (2007-10-31)
Purpose The purpose of this paper is to demonstrate the importance of undertaking a systemic view of information systems evaluation that augments the frequently reported prescriptive (cost/benefit) analysis approaches. Design/methodology/approach The paper adopts a qualitative case perspective and derives a framework for substantive information systems evaluation factors (PRISE). Three empirical formulations are considered and a comparison made to determine the content and context of the findings. Finding...
A framework for ranking and categorizing medical documents
Al Zamıl, Mohammed GH. I.; Betin Can, Aysu; Department of Information Systems (2010)
In this dissertation, we present a framework to enhance the retrieval, ranking, and categorization of text documents in medical domain. The contributions of this study are the introduction of a similarity model to retrieve and rank medical textdocuments and the introduction of rule-based categorization method based on lexical syntactic patterns features. We formulate the similarity model by combining three features to model the relationship among document and construct a document network. We aim to rank ret...
Citation Formats
D. Tunaoglu, O. Alan, O. Sabuncu, S. Akpinar, N. K. Cicekli, and F. N. Alpaslan, “Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates,” 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/48579.