Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution

Date

2009-09-16

Author

Engin, Melih
Can, Tolga

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

248
views

0
downloads

In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply several Support Vector Machine (SVM) configurations on these datasets. We conduct our study considering the performance needs of practical context sensitive systems. Our results show that linear kernel classifiers achieve the best performance in terms of accuracy and speed on text documents expressed as keyword root features.

Subject Keywords

Text classification, Data mining, Machine learning, Artificial intelligence, Information retrieval, World wide web

URI

https://hdl.handle.net/11511/43098

DOI

https://doi.org/10.1109/iscis.2009.5291861

Collections

Department of Computer Engineering, Conference / Seminar

Suggestions

OpenMETU
Core

Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates Doruk, Tunaoğlu; Alan, Özgür; Orkunt, Sabuncu; Samet, Akpınar; Çiçekli, Fehime Nihan; Alpaslan, Ferda Nur (2009-09-16) In this paper, we present a domain specific information extraction approach. We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently. We applied our approach to primarily Turkish unstructured soccer Web-casting texts. Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kin...
Event Extraction from Turkish Football Web-casting Texts Using Hand-crafted Templates Tunaoglu, Doruk; Alan, Oezguer; Sabuncu, Orkunt; Akpinar, Samet; Cicekli, Nihan K.; Alpaslan, Ferda Nur (2009-09-16) In this paper, we present a domain specific information extraction approach We use manually formed templates to extract information from unstructured documents where grammatical and syntactical errors occur frequently We applied our approach to primarily Turkish unstructured soccer web-casting texts Compared to automated approaches we achieve high precision-recall rates (97% - 85%). In addition to that, unlike automated approaches we do not use part-of-speech taggers, parsers, phrase chunkers or that kind o...
Feature Extraction and Classification Phishing Websites Based on URL Aydin, Mustafa; Baykal, Nazife (2015-09-30) In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.
EXTRACTION OF INTERPRETABLE DECISION RULES FROM BLACK-BOX MODELS FOR CLASSIFICATION TASKS GALATALI, EGEMEN BERK; ALEMDAR, HANDE; Department of Computer Engineering (2022-8-31) In this work, we have proposed a new method and ready to use workflow to extract simplified rule sets for a given Machine Learning (ML) model trained on a classifi- cation task. Those rules are both human readable and in the form of software code pieces thanks to the syntax of Python programming language. We have inspired from the power of Shapley Values as our source of truth to select most prominent features for our rule sets. The aim of this work to select the key interval points in given data in order t...
Semantic search on Turkish news domain with automatic query expansion Demir, Tuğba; Çiçekli, Fehime Nihan; Department of Computer Engineering (2016) In this thesis, semantic search on Turkish news domain with query expansion is proposed. Our aim is to provide the user with the most relevant documents related to their entered keywords. Our system uses data sources from Turkish news websites such as Hürriyet, Milliyet, Sabah, etc. Our system extends the user’s query with word embeddings and semantic relatedness. Furthermore, named entities, containing precious information, are extracted from news sources and user query and ranked to return on top of the r...

Citation Formats

M. Engin and T. Can, “Text Classification in the Turkish Marketing Domain for Context Sensitive Ad Distribution,” 2009, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43098.