Show/Hide Menu
Hide/Show Apps
anonymousUser
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Açık Bilim Politikası
Açık Bilim Politikası
Frequently Asked Questions
Frequently Asked Questions
Browse
Browse
By Issue Date
By Issue Date
Authors
Authors
Titles
Titles
Subjects
Subjects
Communities & Collections
Communities & Collections
Effects of Various Preprocessing Techniques to Turkish Text Categorization Using N-Gram Features
Date
2017-10-08
Author
Deniz, Ayca
Kiziloz, Llakan Ezgi
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
3
views
0
downloads
Natural Language Processing (NLP) is a prominent subject which includes various subcategories such as text classification, error correction, machine translation, etc. Unlike other languages, there are limited number of Turkish NLP studies in literature. In this study, we apply text classification on Turkish documents by using n-gram features. Our algorithm applies different preprocessing techniques, namely, n-gram choice (character level or word level, bigram or trigram models), stemming, and use of punctuation, and then determines the Turkish document's author and genre, and the gender of the author. For this purpose, Naive Bayes, Support Vector Machines and Random Forest are used as classification techniques. Finally, we discuss the effects of above mentioned preprocessing techniques to the performance of Turkish text classification.
Subject Keywords
Supervised machine learning
,
Turkish text classification
,
N-gram features
URI
https://hdl.handle.net/11511/65577
Collections
Department of Computer Engineering, Conference / Seminar