Static Malware Detection Using Stacked BiLSTM and GPT-2

Download
2022-01-01
Demirci, Deniz
Sahin, Nazenin
Sirlancis, Melih
Acartürk, Cengiz
In recent years, cyber threats and malicious software attacks have been escalated on various platforms. Therefore, it has become essential to develop automated machine learning methods for defending against malware. In the present study, we propose stacked bidirectional long short-term memory (Stacked BiLSTM) and generative pre-trained transformer based (GPT-2) deep learning language models for detecting malicious code. We developed language models using assembly instructions extracted from .text sections of malicious and benign Portable Executable (PE) files. We treated each instruction as a sentence and each .text section as a document. We also labeled each sentence and document as benign or malicious, according to the file source. We created three datasets from those sentences and documents. The first dataset, composed of documents, was fed into a Document Level Analysis Model (DLAM) based on Stacked BiLSTM. The second dataset, composed of sentences, was used in Sentence Level Analysis Models (SLAMs) based on Stacked BiLSTM and DistilBERT, Domain Specific Language Model GPT-2 (DSLM-GPT2), and General Language Model GPT-2 (GLM-GPT2). Lastly, we merged all assembly instructions without labels for creating the third dataset; then we fed a custom pre-trained model with it. We then compared malware detection performances. The results showed that the pre-trained model improved the DSLM-GPT2 and GLM-GPT2 detection performance. The experiments showed that the DLAM, the SLAM based on DistilBERT, the DSLM-GPT2, and the GLM-GPT2 achieved 98.3%, 70.4%, 86.0%, and 76.2% F1 scores, respectively.

Suggestions

Static Malware Detection Using Stacked Bi-Directional LSTM
Demirci, Deniz; Acartürk, Cengiz; Department of Cybersecurity (2021-8-19)
The recent proliferation in the use of the Internet and personal computers has made it easier for cybercriminals to expose Internet users to widespread and damaging threats. In order protect the end users against such threats, a security system must be proactive. It needs to detect malicious files or executables before reaching the end-user. To create an efficient and low-cost malware detection mechanism, in the present study, we propose stacked bidirectional long short-term memory (Stacked BiLSTM) based de...
Malicious code detection in android: the role of sequence characteristics and disassembling methods
Gürkan Balıkçıoğlu, Pınar; Şırlancı, Melih; ACAR KÜÇÜK, ÖZGE; Ulukapi, Bulut; Turkmen, Ramazan K.; Acartürk, Cengiz (2022-11-01)
The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in ...
Adversarial Attacks on Continuous Authentication Security: A Dynamic Game Approach
Sarıtaş, Serkan; Sandberg, Henrik; Dan, Gyorgy (2019-01-01)
Identity theft through phishing and session hijacking attacks has become a major attack vector in recent years, and is expected to become more frequent due to the pervasive use of mobile devices. Continuous authentication based on the characterization of user behavior, both in terms of user interaction patterns and usage patterns, is emerging as an effective solution for mitigating identity theft, and could become an important component of defense-in-depth strategies in cyber-physical systems as well. In th...
Anomaly-Based Intrusion Detection by Machine Learning: A Case Study on Probing Attacks to an Institutional Network
Tufan, Emrah; Tezcan, Cihangir; Acartürk, Cengiz (2021-01-01)
Cyber attacks constitute a significant threat to organizations with implications ranging from economic, reputational, and legal consequences. As cybercriminals' techniques get sophisticated, information security professionals face a more significant challenge to protecting information systems. In today's interconnected realm of computer systems, each attack vector has a network dimension. The present study investigates network intrusion attempts with anomaly-based machine learning models to provide better p...
Online DDoS attack detection using Mahalanobis distance and Kernel-based learning algorithm
Cakmakci, Salva Daneshgadeh; Kemmerich, Thomas; Ahmed, Tarem; Baykal, Nazife (Elsevier BV, 2020-10-01)
Distributed denial-of-service (DDoS) attacks are constantly evolving as the computer and networking technologies and attackers' motivations are changing. In recent years, several supervised DDoS detection algorithms have been proposed. However, these algorithms require a priori knowledge of the classes and cannot automatically adapt to frequently changing network traffic trends. This emphasizes the need for the development of new DDoS detection mechanisms that target zero-day and sophisticated DDoS attacks....
Citation Formats
D. Demirci, N. Sahin, M. Sirlancis, and C. Acartürk, “Static Malware Detection Using Stacked BiLSTM and GPT-2,” IEEE ACCESS, vol. 10, pp. 58488–58502, 2022, Accessed: 00, 2022. [Online]. Available: https://hdl.handle.net/11511/99660.