Application of subspace clustering to scalable malware clustering

Işıktaş, Fatih
In recent years, massive proliferation of malware variants has made it necessary to employ sophisticated clustering techniques in malware analysis. Choosing an appropriate clustering approach is very important especially for rapidly and accurately mining clustering information from a large malware set with high number of attributes. In this study, we propose a clustering method that is based on subspace clustering and graph matching techniques and presents an enhanced clustering ability and scalable runtime performance for the analysis of large malware sets. Unlike traditional signature-based clustering techniques, we aimed to obtain more accurate malware clusters by comparing internal structures of malware binaries. We also integrated a subspace clustering technique in order to scale and speed up the clustering process. To be able to verify our method, we developed a system prototype that can perform the mentioned clustering processes. This prototype provides a graphical user interface which allows users to navigate over malware binaries and generated clusters for a detailed analysis. We performed clustering experiments on real malware sets by using our system prototype. The experiment results showed that using a clustering method based on comparison of internal structure of malware binaries reveals clustering outputs with a 98% accuracy. Besides, the experiment results demonstrated that our method significantly improves the runtime performance of the clustering process without degrading clustering accuracy.


Detecting malicious behavior in binary programs using dynamic symbolic execution and API call sequences
Tatar, Fatih Tamer; Betin Can, Aysu; Department of Bioinformatics (2021-6)
Program analysis becomes an important part of malware detection as malware become stealthier and more complex. For example, modern malware may detect whether they are under analysis and they may use certain triggers such as time to avoid detection. However, current detection techniques turn out to be insufficient as they have limitations to detect new, obfuscated, and intelligent malware. In this thesis, we propose a behavior based malware detection methodology using API call sequence analysis. In our metho...
Malicious code detection: run trace analysis by LSTM
Şırlancı, Melih; Acartürk, Cengiz; Gürkan Balıkçıoğlu, Pınar; Department of Cybersecurity (2021-6)
Malicious software threats and their detection have been gaining importance as a subdomain of information security due to the expansion of ICT applications in daily settings. A major challenge in designing and developing anti-malware systems is the coverage of the detection, particularly the development of dynamic analysis methods that can detect polymorphic and metamorphic malware efficiently. In the present study, we propose a methodological framework for detecting malicious code by analyzing run trace ou...
Tatar, Fatih Tamer; Betin Can, Aysu (2022-10-01)
As malicious software gets more stealthy and smarter, software analysis has become an essential part of malware detection. Modern malware does not immediately display its malicious behavior, especially if they are aware that it is being analyzed. For instance, malware can detect the runtime environment and use certain triggers, such as time, to avoid detection. Static analysis fails on obfuscated code whereas dynamic analysis struggles to find the right actions and conditions to trigger malicious act...
ZEKI: unsupervised zero-day exploit kit intelligence
Suren, Emre (The Scientific and Technological Research Council of Turkey, 2020-01-01)
Over the last few years, exploit kits (EKs) have become the de facto medium for large-scale spread of malware. Drive-by download is the leading method that is widely used by EK flavors to exploit web-based client-side vulnerabilities. Their principal goal is to infect the victim's system with a malware. In addition, EK families evolve quickly, where they port zero-day exploits for brand new vulnerabilities that were never seen before and for which no patch exists. In this paper, we propose a novel approach ...
Using operational data for decision making a feasibility study in rail maintenance
Marsh, William; Nur, Khalid; Yet, Barbaros; Majumdar, Arnab (2016-05-01)
In many organisations, large databases are created as part of the business operation: the promise of ‘big data’ is to extract information from these databases to make smarter decisions. We explore the feasibility of this approach for better decision-making for maintenance, specifically for rail infrastructure. We argue that the data should be used within a Bayesian framework with the aim of inferring the underlying state of the system so we can predict future failures and improve decision-making. Within thi...
Citation Formats
F. Işıktaş, “Application of subspace clustering to scalable malware clustering,” M.S. - Master of Science, Middle East Technical University, 2019.