Image-based malware family classification with deep learning and a new dataset

2024-5
Mutlu, Emre
Security breaches and incidents due to malware, which is still exponentially evolving in sophistication, continue to increase and will likely be a significant security concern in the future. Moreover, generating a large number of new malware is easier than in the past, due to the recent evasive techniques. Because of the exponential growth in malware attacks, malware detection continues to be an active research topic. Since analyzing thousands of malware with manual methods is not suitable, deep learning algorithms have recently been employed to conduct efficient malware detection. One of the real challenges for detecting malware is developing methods that can identify them without the need for disassembly, debugging, or execution in a reasonable time. On the other side, it is very hard to prepare a new malware dataset for academic purposes. For this reason, we created a new and up to date dataset called MamMalware and generated two custom datasets from MamMalware which have different sizes in terms of number of malware samples and malware families. These datasets are publicly available. All samples are translated into gray-scale image files, and we also extracted the opcode sequences of the samples. Image files and opcode sequences are used as input. Then we applied 2 and 3 layered Convolutional Neural Networks (CNN) experiments on our new datasets. In addition, we conducted experiments using the transfer learning methods with ResNet152 and VGG19 pretrained models. As a result, the transfer learning models obtained the best results with 94% test accuracy. We also validated the results of a prior study. Additionally, we observed that after a certain size, the size of datasets used in this study has a negligible effect on accuracy.
Citation Formats
E. Mutlu, “Image-based malware family classification with deep learning and a new dataset,” M.S. - Master of Science, Middle East Technical University, 2024.