Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution

2009-09-16
Engin, Melih
Can, Tolga
In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the performance needs of practical context sensitive systems. Our results show that linear kernel classifiers achieve the best performance in terms of accuracy and speed on text documents expressed as keyword root features.

Suggestions

Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Doruk, Tunaoğlu; Alan, Özgür; Orkunt, Sabuncu; Samet, Akpınar; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach. We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently. We applied our approach to primarily Turkish unstructured soccer Web-casting texts. Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kin...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates
Tunaoglu, Doruk; Alan, Oezguer; Sabuncu, Orkunt; Akpinar, Samet; Cicekli, Nihan K.; Alpaslan, Ferda Nur (2009-09-16)
In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind o...
Feature Extraction and Classification Phishing Websites Based on URL
Aydin, Mustafa; Baykal, Nazife (2015-09-30)
In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.
EXTRACTION OF INTERPRETABLE DECISION RULES FROM BLACK-BOX MODELS FOR CLASSIFICATION TASKS
GALATALI, EGEMEN BERK; ALEMDAR, HANDE; Department of Computer Engineering (2022-8-31)
In this work, we have proposed a new method and ready to use workflow to extract simplified rule sets for a given Machine Learning (ML) model trained on a classifi- cation task. Those rules are both human readable and in the form of software code pieces thanks to the syntax of Python programming language. We have inspired from the power of Shapley Values as our source of truth to select most prominent features for our rule sets. The aim of this work to select the key interval points in given data in order t...
Semantic search on Turkish news domain with automatic query expansion
Demir, Tuğba; Çiçekli, Fehime Nihan; Department of Computer Engineering (2016)
In this thesis, semantic search on Turkish news domain with query expansion is proposed. Our aim is to provide the user with the most relevant documents related to their entered keywords. Our system uses data sources from Turkish news websites such as Hürriyet, Milliyet, Sabah, etc. Our system extends the user’s query with word embeddings and semantic relatedness. Furthermore, named entities, containing precious information, are extracted from news sources and user query and ranked to return on top of the r...
Citation Formats
M. Engin and T. Can, “Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution,” 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43098.