Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution

2009-09-16
Engin, Melih
Can, Tolga
In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the performance needs of practical context sensitive systems. Our results show that linear kernel classifiers achieve the best performance in terms of accuracy and speed on text documents expressed as keyword root features.

Suggestions

Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Doruk, Tunaoğlu; Alan, Özgür; Orkunt, Sabuncu; Samet, Akpınar; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach. We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently. We applied our approach to primarily Turkish unstructured soccer Web-casting texts. Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kin...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Tunaoglu, Doruk; Alan, Oezguer; Sabuncu, Orkunt; Akpinar, Samet; Cicekli, Nihan K.; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind o...
Feature Extraction and Classification Phishing Websites Based on URL
Aydin, Mustafa; Baykal, Nazife (2015-09-30)
In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.
Semantic search on Turkish news domain with automatic query expansion
Demir, Tuğba; Çiçekli, Fehime Nihan; Department of Computer Engineering (2016)
In this thesis, semantic search on Turkish news domain with query expansion is proposed. Our aim is to provide the user with the most relevant documents related to their entered keywords. Our system uses data sources from Turkish news websites such as Hürriyet, Milliyet, Sabah, etc. Our system extends the user’s query with word embeddings and semantic relatedness. Furthermore, named entities, containing precious information, are extracted from news sources and user query and ranked to return on top of the r...
Topic-centric querying of web information resources
Altıngövde, İsmail Sengör; Ulusoy, O; Ozsoyoglu, G; Ozsoyoglu, ZM (2001-01-01)
This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web information resources, and a query language SQL-TC (Topic-Centric SQL) to query the model. The model is composed of web-based information resources (XML or HTML documents on the web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized inf...
Citation Formats
M. Engin and T. Can, “Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution,” 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43098.