Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Static Malware Detection Using Stacked BiLSTM and GPT-2
Download
index.pdf
Date
2022-01-01
Author
Demirci, Deniz
Sahin, Nazenin
Sirlancis, Melih
Acartürk, Cengiz
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
220
views
269
downloads
Cite This
In recent years, cyber threats and malicious software attacks have been escalated on various platforms. Therefore, it has become essential to develop automated machine learning methods for defending against malware. In the present study, we propose stacked bidirectional long short-term memory (Stacked BiLSTM) and generative pre-trained transformer based (GPT-2) deep learning language models for detecting malicious code. We developed language models using assembly instructions extracted from .text sections of malicious and benign Portable Executable (PE) files. We treated each instruction as a sentence and each .text section as a document. We also labeled each sentence and document as benign or malicious, according to the file source. We created three datasets from those sentences and documents. The first dataset, composed of documents, was fed into a Document Level Analysis Model (DLAM) based on Stacked BiLSTM. The second dataset, composed of sentences, was used in Sentence Level Analysis Models (SLAMs) based on Stacked BiLSTM and DistilBERT, Domain Specific Language Model GPT-2 (DSLM-GPT2), and General Language Model GPT-2 (GLM-GPT2). Lastly, we merged all assembly instructions without labels for creating the third dataset; then we fed a custom pre-trained model with it. We then compared malware detection performances. The results showed that the pre-trained model improved the DSLM-GPT2 and GLM-GPT2 detection performance. The experiments showed that the DLAM, the SLAM based on DistilBERT, the DSLM-GPT2, and the GLM-GPT2 achieved 98.3%, 70.4%, 86.0%, and 76.2% F1 scores, respectively.
Subject Keywords
Malware
,
Feature extraction
,
Codes
,
Analytical models
,
Static analysis
,
Natural language processing
,
Transformers
,
Malware detection
,
static analysis
,
stacked BiLSTM
,
GPT-2
,
CLASSIFICATION
URI
https://hdl.handle.net/11511/99660
Journal
IEEE ACCESS
DOI
https://doi.org/10.1109/access.2022.3179384
Collections
Graduate School of Informatics, Article
Suggestions
OpenMETU
Core
Static Malware Detection Using Stacked Bi-Directional LSTM
Demirci, Deniz; Acartürk, Cengiz; Department of Cybersecurity (2021-8-19)
The recent proliferation in the use of the Internet and personal computers has made it easier for cybercriminals to expose Internet users to widespread and damaging threats. In order protect the end users against such threats, a security system must be proactive. It needs to detect malicious files or executables before reaching the end-user. To create an efficient and low-cost malware detection mechanism, in the present study, we propose stacked bidirectional long short-term memory (Stacked BiLSTM) based de...
Malicious code detection in android: the role of sequence characteristics and disassembling methods
Gürkan Balıkçıoğlu, Pınar; Şırlancı, Melih; ACAR KÜÇÜK, ÖZGE; Ulukapi, Bulut; Turkmen, Ramazan K.; Acartürk, Cengiz (2022-11-01)
The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in ...
Adversarial Attacks on Continuous Authentication Security: A Dynamic Game Approach
Sarıtaş, Serkan; Sandberg, Henrik; Dan, Gyorgy (2019-01-01)
Identity theft through phishing and session hijacking attacks has become a major attack vector in recent years, and is expected to become more frequent due to the pervasive use of mobile devices. Continuous authentication based on the characterization of user behavior, both in terms of user interaction patterns and usage patterns, is emerging as an effective solution for mitigating identity theft, and could become an important component of defense-in-depth strategies in cyber-physical systems as well. In th...
Anomaly-Based Intrusion Detection by Machine Learning: A Case Study on Probing Attacks to an Institutional Network
Tufan, Emrah; Tezcan, Cihangir; Acartürk, Cengiz (2021-01-01)
Cyber attacks constitute a significant threat to organizations with implications ranging from economic, reputational, and legal consequences. As cybercriminals' techniques get sophisticated, information security professionals face a more significant challenge to protecting information systems. In today's interconnected realm of computer systems, each attack vector has a network dimension. The present study investigates network intrusion attempts with anomaly-based machine learning models to provide better p...
Online DDoS attack detection using Mahalanobis distance and Kernel-based learning algorithm
Cakmakci, Salva Daneshgadeh; Kemmerich, Thomas; Ahmed, Tarem; Baykal, Nazife (Elsevier BV, 2020-10-01)
Distributed denial-of-service (DDoS) attacks are constantly evolving as the computer and networking technologies and attackers' motivations are changing. In recent years, several supervised DDoS detection algorithms have been proposed. However, these algorithms require a priori knowledge of the classes and cannot automatically adapt to frequently changing network traffic trends. This emphasizes the need for the development of new DDoS detection mechanisms that target zero-day and sophisticated DDoS attacks....
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
D. Demirci, N. Sahin, M. Sirlancis, and C. Acartürk, “Static Malware Detection Using Stacked BiLSTM and GPT-2,”
IEEE ACCESS
, vol. 10, pp. 58488–58502, 2022, Accessed: 00, 2022. [Online]. Available: https://hdl.handle.net/11511/99660.