Semantic concept recognition from structured and unstructured inputs within cyber security domain

Hoşsucu, Alp Gökhan
Linked data initiative has been quite successful in terms of publishing and interlinking data over ontological structures. The success is due to answering semantically rich queries over highly structured data. The utilization of linked data structures are widely used in various domains to solve the problem of producing domain specific knowledge which can be interpreted by automated agents without any human interference. Cyber security field is one of the domains that suffer from the excessiveness of the raw data and lacking of the knowledge which constantly requires incorporation of subject matter experts in security analyzes or reasoning processes. The principle aim of this study is to propose an automated approach for cyber-security related knowledge base generation from scratch by utilizing from both structured and unstructured domain related data. The proposed approach is based on the automatic extraction of significant phrases and conversion of them into semantic concepts within the scope of already existing cyber security databases CWE, CPE, VVS and CCE. The system utilizes this raw data, differentiates the structured and unstructured parts which are processed in different modules for knowledge extraction. These concepts are represented in RDF format which includes all the relationships between entities to construct ontology for cyber security domain. To enhance the knowledge extraction process, NLP oriented approaches including Key Phrase Extraction methodologies are used and data augmentation techniques are applied to the concepts by interlinking them to the entities in Freebase and Wikipedia indexes. As a consequence of these operation series, a modular system is developed which is capable of extracting knowledge from the given cyber security related data. This accumulated knowledge constitutes a basis for cyber-security ontology which can be used for further vulnerability identification and prevention.


Derivation of Transcriptional Regulatory Relationships by Partial Least Squares Regression
Tan, Mehmet; Polat, Faruk; Alhajj, Reda (2009-11-04)
As the number of genes in a transcriptional regulatory network is large and the number of samples in biological data types is usually small, there is a need for integrating multiple data types for reverse engineering these networks. In this paper, we propose a method to integrate microarray gene expression, ChIP-chip and transcription factor binding motif data sets in a partial least squares regression model to derive transcription factors (TFs) gene interactions. Both single and synergistic effects of TFs ...
Multiobjective relational data warehouse design for the cloud
Dökeroğlu, Tansel; Coşar, Ahmet; Department of Computer Engineering (2014)
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robu...
Data sharing and access with a corba data distribution service implementation
Dursun, Mustafa; Bilgen, Semih; Department of Electrical and Electronics Engineering (2006)
Data Distribution Service (DDS) specification defines an API for Data-Centric Publish-Subscribe (DCPS) model to achieve efficient data distribution in distributed computing environments. Lack of definition of interoperability architecture in DDS specification obstructs data distribution between different and heterogeneous DDS implementations. In this thesis, DDS is implemented as a CORBA service to achieve interoperability and a QoS policy is proposed for faster data distribution with CORBA features.
Semantic information-based alternative plan generation for multiple query optimization
Polat, Faruk; Alhajj, R (Elsevier BV, 2001-09-01)
This paper addresses the impact of semantic information about queries on alternative plan generation (APG) for multiple query optimization (MQO). MQO covers optimizing the execution of a set of queries together where each query in the set to be optimized has several alternative execution plans. A multiple query optimizer selects an alternative plan for each query to obtain an optimal global execution plan. Our approach uses information such as common relations, common possible joins and common conditions to...
Privacy preserving database external layer construction algorithm via secure decomposition for attribute-based security policies
Turan, Uğur; Toroslu, İsmail Hakkı; Kantarcıoğlu, Murat; Department of Computer Engineering (2018)
Relational DBMS’scontinue to dominate th emarket an dinference problem on external schema has preserved its importance in terms of data privacy. Especially for the last 10 years, external schema construction for application-specific database usage has increased its independency from the conceptual schema, as the definitions and implementations of views and procedures have been optimized. After defining all mathematical background, this work offers an optimized decomposition strategy for the external schema, wh...
Citation Formats
A. G. Hoşsucu, “Semantic concept recognition from structured and unstructured inputs within cyber security domain,” M.S. - Master of Science, Middle East Technical University, 2015.