DETECTING ANDROID OBFUSCATION METHODS WITH LSTM

Download
2022-6-17
Ulukapi, Bulut
In software development, obfuscation means intentionally designing the source code to make it more difficult to understand by humans, thus making the analysis and reverse engineering challenging to accomplish. Obfuscation methods modify the source code syntactically while maintaining its original functionality. Obfuscation is also integrated into the Android development environment, and it has been used widely by legitimate software developers and malware authors. The app developers employ obfuscation methods to protect intellectual property rights, whereas malware authors use them for evading detection and digital forensics. The widespread usage of obfuscation techniques presents challenges to researchers and analysts who focus on app cloning, repackaging, third-party library, and malware detection. Hence, it is crucial to detect obfuscation for building reliable, effective, and automated detection systems. In the present study, we utilized Natural Language Processing (NLP) techniques and Long-short Term Memory (LSTM) to detect whether a given application is obfuscated or not. For this, we collected applications from F-Droid, an open-source Android app repository, and obfuscated them with nine different obfuscation techniques. We performed experiments and obtained promising results for detecting different obfuscation methods. We also observed that the model is more successful in detecting specific obfuscation methods than the others.

Suggestions

Computing cryptographic properties of Boolean functions from the algebraic normal orm representation
Çalık, Çağdaş; Doğanaksoy, Ali; Department of Cryptography (2013)
Boolean functions play an important role in the design and analysis of symmetric-key cryptosystems, as well as having applications in other fields such as coding theory. Boolean functions acting on large number of inputs introduces the problem of computing the cryptographic properties. Traditional methods of computing these properties involve transformations which require computation and memory resources exponential in the number of input variables. When the number of inputs is large, Boolean functions are ...
Identifying textual personal information with artificial neural networks
Demir, Memduh Çağrı; Ertekin Bolelli, Şeyda; Department of Computer Engineering (2019)
Solutions to many natural language processing problems need language-specific labeled data to be learned. However, both the endeavor of compiling a new dataset in a new language and the practice of translating an existing dataset to another language require human expert effort which can not be automated. To learn a solution in a new target language in an automated manner without any extra data, we focus on the known problem of dialogue act classification and propose two solutions that combine existing dialo...
Testing distributed real-time systems with a distributed test approach
Öztaş, Gökhan; Schmidt, Şenan Ece; Department of Electrical and Electronics Engineering (2008)
Software testing is an important phase the of software development cycle which reveals faults and ensures correctness of the developed software. Distributed real-time systems are mostly safety critical systems for which the correctness and quality of the software is much more significant. However, majority of the current testing techniques have been developed for sequential (non real-time) software and there is a limited amount of research on testing distributed real-time systems. In this thesis, a proposed...
A New Design Approach for Rapid Evaluation of Structural Modifications Using Neural Networks
Demirkan, O.; Olceroglu, E.; BAŞDOĞAN, FATMA İPEK; Özgüven, Hasan Nevzat (2013-02-01)
Design optimization of structural systems is often iterative, time consuming and is limited by the knowledge of the designer. For that reason, a rapid design optimization scheme is desirable to avoid such problems. This paper presents and integrates two design methodologies for efficient conceptual design of structural systems involving computationally intensive analysis. The first design methodology used in this paper is structural modification technique (SMT). The SMT utilizes the frequency response funct...
Comparison of linear and adaptive versions of the Turkish pupil monitoring system (PMS) mathematics assessment
Gökçe, Semirhan; Berberoğlu, Halil Giray; Department of Secondary Science and Mathematics Education (2012)
Until the developments in computer technology, linear test administrations within classical test theory framework is mostly used in testing practices. These tests contain a set of predefined items in a large range of difficulty values for collecting information from students at various ability levels. However, placing very easy and very difficult items in the same test not only cause wasting time and effort but also introduces possible extraneous variables into the measurement process such as possibility of...
Citation Formats
B. Ulukapi, “DETECTING ANDROID OBFUSCATION METHODS WITH LSTM,” M.S. - Master of Science, Middle East Technical University, 2022.