Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection

2019-06-01
Altay, Betul
Dokeroglu, Tansel
Coşar, Ahmet
Conventional malicious webpage detection methods use blacklists in order to decide whether a webpage is malicious or not. The blacklists are generally maintained by third-party organizations. However, keeping a list of all malicious Web sites and updating this list regularly is not an easy task for the frequently changing and rapidly growing number of webpages on the web. In this study, we propose a novel context-sensitive and keyword density-based method for the classification of webpages by using three supervised machine learning techniques, support vector machine, maximum entropy, and extreme learning machine. Features (words) of webpages are obtained from HTML contents and information is extracted by using feature extraction methods: existence of words, keyword frequencies, and keyword density techniques. The performance of proposed machine learning models is evaluated by using a benchmark data set which consists of one hundred thousand webpages. Experimental results show that the proposed method can detect malicious webpages with an accuracy of 98.24%, which is a significant improvement compared to state-of-the-art approaches.
SOFT COMPUTING

Suggestions

Context-sensitive keyword density based supervised learning techniques for detection of malicious web pages
Altay, Betül; Coşar, Ahmet; Dökeroğlu, Tansel; Department of Computer Engineering (2016)
Conventional methods use a black list in order to decide whether a web page is malicious or not. These black lists are generally produced by technicians or operators and used for the security purposes of the organizations, protection of software from web based virus attacks, web browsers, etc. However, the blacklist approach is not a scalable solution for the frequently changing and rapidly growing number of web pages on the internet and their dynamic contents. In this thesis, we propose and analyze a metho...
Feature Extraction and Classification Phishing Websites Based on URL
Aydin, Mustafa; Baykal, Nazife (2015-09-30)
In this study we extracted websites' URL features and analyzed subset based feature selection methods and classification algorithms for phishing websites detection.
UNWANTED BEHAVIOUR DETECTION AND CLASSIFICATION IN NETWORK TRAFFIC
Onem, Ismail Melih (2010-10-28)
An Intrusion Detection System classifies activities at an unwanted intention and can log or prevent activities that are marked as intrusions. Intrusions occur when malicious activity and unwanted behaviour gain access to or affect the usability of a computer resource. During the last years, anomaly discovery has attracted the attention of many researchers to overcome the disadvantage of signature-based IDSs in discovering novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of ...
Detecting malicious behavior in binary programs using dynamic symbolic execution and API call sequences
Tatar, Fatih Tamer; Betin Can, Aysu; Department of Bioinformatics (2021-6)
Program analysis becomes an important part of malware detection as malware become stealthier and more complex. For example, modern malware may detect whether they are under analysis and they may use certain triggers such as time to avoid detection. However, current detection techniques turn out to be insufficient as they have limitations to detect new, obfuscated, and intelligent malware. In this thesis, we propose a behavior based malware detection methodology using API call sequence analysis. In our metho...
Modelling the effects of malware propagation on military operations by using bayesian network framework
Şengül, Zafer; Acartürk, Cengiz; Department of Cyber Security (2019)
Malware are malicious programs that cause unwanted system behavior and usually result in damage to IT systems or its users. These effects can also be seen during military operations because high-tech military weapons, command, control and communication systems are also interconnected IT systems. This thesis employs conventional models that have been used for modeling the propagation of biological diseases to investigate the spread of malware in connected systems. In particular, it proposes a probabilistic l...
Citation Formats
B. Altay, T. Dokeroglu, and A. Coşar, “Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection,” SOFT COMPUTING, pp. 4177–4191, 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/30216.