Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
New Techniques in Profiling Big Datasets for Machine Learning with a Concise Review of Android Mobile Malware Datasets
Date
2018-12-04
Author
CANBEK, Gurol
SAĞIROĞLU, ŞEREF
Taşkaya Temizel, Tuğba
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
177
views
0
downloads
Cite This
As the volume, variety, velocity aspects of big data are increasing, the other aspects such as veracity, value, variability, and venue could not be interpreted easily by data owners or researchers. The aspects are also unclear if the data is to be used in machine learning studies such as classification or clustering. This study proposes four techniques with fourteen criteria to systematically profile the datasets collected from different resources to distinguish from one another and see their strong and weak aspects. The proposed approach is demonstrated in five Android mobile malware datasets in the literature and in security industry namely Android Malware Genome Project, Drebin, Android Malware Dataset, Android Botnet, and Virus Total 2018. The results have shown that the proposed profiling methods reveal remarkable insight about the datasets comparatively and directs researchers to achieve big but more visible, qualitative, and internalized datasets.
Subject Keywords
Data profiling
,
Data quality
,
Big data
,
Malware detection
,
Mobile malware
,
Machine learning
,
Classification
,
Android
,
Feature engineering
URI
https://hdl.handle.net/11511/55994
Conference Name
International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT)
Collections
Graduate School of Informatics, Conference / Seminar
Suggestions
OpenMETU
Core
BIG DATA FOR INDUSTRY 4.0: A CONCEPTUAL FRAMEWORK
Gökalp, Mert Onuralp; Kayabay, Kerem; Eren, Pekin Erhan; Koçyiğit, Altan (2016-12-17)
Exponential growth in data volume originating from Internet of Things sources and information services drives the industry to develop new models and distributed tools to handle big data. In order to achieve strategic advantages, effective use of these tools and integrating results to their business processes are critical for enterprises. While there is an abundance of tools available in the market, they are underutilized by organizations due to their complexities. Deployment and usage of big data analysis t...
A Study for Development of Propagation Model Based on Ray Tracing for Coverage Prediction in Terrestrial Broadcasting Systems
Tabakcioglu, Mehmet Baris; Ozmen, Ahmet; KARA, ALİ (2009-04-11)
In this work, improvements on propagation prediction models based on ray tracing in coverage estimation for digital broadcasting systems are presented. For this purpose, firstly propagation models based on Geometrical Theory of Diffraction (GTD) are discussed, and then an improved model is proposed for prediction of propagation path loss or electric field strength at the receiver. The proposed model incorporates first order expansion of classical GTD in field computation and convex hull for ray tracing. Sim...
Generative Data Augmentation for Vehicle Detection in Aerial Images
Kumdakçı, Hilmi; Öngün, Cihan; Temizel, Alptekin (null; 2021-01-15)
Scarcity of training data is one of the prominent problemsfor deep networks which require large amounts data. Data augmentationis a widely used method to increase the number of training samples andtheir variations. In this paper, we focus on improving vehicle detectionperformance in aerial images and propose a generative augmentationmethod which does not need any extra supervision than the boundingbox annotations of the vehicle objects in the training dataset. The pro-posed method increases the perf...
INTEGRATED INSTANCE-BASED AND KERNEL METHODS FOR POWER QUALITY KNOWLEDGE MODELING
Guder, Mennan; Salor, Ozgul; Cadirci, Isik (2010-10-28)
In this paper, an integrated knowledge discovery strategy for high dimensional spatial power quality event data is proposed. Real time, distributed measuring of the electricity transmission system parameters provides huge number of time series power quality events. The proposed method aims to construct characteristic event distribution and interaction models for individual power quality sensors and the whole electricity transmission system by considering feasibility, time and accuracy concerns. In order to ...
A statistical approach for performance evaluation of 386 and 486 microprocessors
Akman, I (1996-02-01)
This paper endeavors to show how multivariate statistical techniques may be used to assess the hardware performance of 386 and 486 based microcomputers. The benchmark test results collected from 34 microcomputers were analyzed in order to find the objective ''assessment criterion'' which could be used as part of the future certification process. The dominant hardware components for the price of these computers were also searched.
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
G. CANBEK, Ş. SAĞIROĞLU, and T. Taşkaya Temizel, “New Techniques in Profiling Big Datasets for Machine Learning with a Concise Review of Android Mobile Malware Datasets,” presented at the International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), Turkish IT Author, Ankara, TURKEY, 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/55994.