AUGMENTING A TURKISH DATASET FOR SPAM FILTERING USING NATURAL LANGUAGE PROCESSING TECHNIQUES

2022-8-25
AKSOY, AYŞENUR
Today, how we communicate is altering as a consequence of the evolution of the internet. Since one of the main communication ways of the internet is e-mail systems and they are easy to use, cheap and fast, and have a wide user base, they have also become a broad environment for malicious actors to act within. Correspondingly, spam e-mails, defined as any kind of unwanted, unwelcomed e-mails sent in bulk, are one of the main tools for these malicious actors. Even if there is not yet a definitive way to stop spam e-mails, filtering techniques are improving all the time. In time, spam filtering became one of the most commonly used text classification issues in Natural Language Processing, too. There are multiple ways to improve the classification success of the machine learning methods, one of them is data augmentation. Augmentation serves to generate more unique data from the dataset at hand and improves the functionality and accuracy of machine learning models. A machine learning model improves if the dataset is sufficient and large enough. In this study, we examined the effects of semantically augmenting a Turkish dataset on the accuracy of spam filtering methods and observed efficient results that can be used in research.

Suggestions

A study of lightweight cryptography
Çamur, Zeliha; Doğanaksoy, Ali; Department of Cryptography (2020)
Technology is evolving rapidly and with technology, the internet is also changing. People used to use internet to connect to each other. But with the changes in recent years, the internet is starting to be used more to connect devices to each other. These devices can range from powerful computing devices, such as desktop computers and tablets, to resource contrained devices, such as RFID tags and sensor networks. When it comes to these constrained devices, conventional cryptographic algorithms fail to provi...
A PHENOMENOLOGICAL INVESTIGATION OF PRESCHOOLERS' EXPERIENCES IN READING AN E-BOOK
Kaplan, Göknur; Yildirim, Caglar; Islim, Omer Faruk; Duman, Murat (2012-07-04)
The diffusion of technology in our daily lives has changed our ways of communication, socialization, play, etc. as well as the way we learn. The reflections of this transformation can also be seen in both formal and informal educational settings. A brief review of the literature reveals both successful implementations and potential problems with technology use in education. However, there is still lack of research studies regarding the use of technology in early childhood education. The literature provides ...
Metadiscourse analysis of digital interpersonal interactions in academic settings in Turkey
Hatipoğlu, Çiler (null; 2019-08-20)
Rapid technological advances, efficiency and easy access have firmly established emailing as a vital medium of communication in the last decades. Nowadays, all around the world, particularly in educational settings, the medium is one of the most widely used modes of interaction between students and university lecturers. Despite their important role in academic life, very little is known about the metadiscursive characteristics of these e-messages and as far as the author is aware there is no study that has ...
Improved probabilistic matrix factorization model for sparse datasets /
Ar, Yılmaz; Taşkaya Temizel, Tuğba; Department of Information Systems (2014)
The amount of information on the World Wide Web has increased significantly owing to advancing web and information technologies. This has made it difficult for users to obtain relevant and useful information thus there is a need for information filtering. Recommender Systems (RS) have emerged as a technique to overcome the problem. Collaborative Filtering (CF) that is one of the widely used RS approaches aims to predict users’ preference concerning an item. The main idea behind CF is the users who agreed in...
A New search engine ontology for visually impaired users
Akkaya, Ezgi; Karagöz, Pınar; Betin Can, Aysu; Department of Information Systems (2015)
In today’s world, semantic technology is getting more and more important day by day. After Web 3.0, semantic infrastructure has become a must for internet based systems. In this thesis, we have focused on the semantic basis of search engines. Google, Yahoo, Yandex and Bing are the most popular search engines. All of them have a semantic structure, however the semantics have been developed according to the users who have not any visual impairment. In this study, search engine ontology has been developed acco...
Citation Formats
A. AKSOY, “AUGMENTING A TURKISH DATASET FOR SPAM FILTERING USING NATURAL LANGUAGE PROCESSING TECHNIQUES,” M.S. - Master of Science, Middle East Technical University, 2022.