Fast, efficient and dynamically optimized data and hardware architectures for string matching

Download
2014
Zengin, Salih
Many fields of computing such as network intrusion detection employ string matching modules (SMM) that search for a given set of strings in their input. An SMM is expected to produce correct outcomes while scanning the input data at high rates. Furthermore, the string sets that are searched for are usually large and their sizes increase steadily. In this thesis, motivated by the requirement of designing fast, accurate and efficient SMMs; we propose a number of SMM architectures that employ Bloom Filters to compactly represent the large amounts of data for the string sets. The proposed architectures address the well-known slowdown problem of the Bloom Filters because of the verifications of the positive matches. To this end, the first contribution of the thesis is Double Bloom Filter SMM (DBFSMM) which employs a second Bloom Filter which acts as a verification engine. We present an analysis, evaluation and implementation of the DBF-SMM.We further verify the required functionality of the DBF-SMM by modeling and testing the architecture in SystemC environment. Our analytical and implementation results demonstrate that DBF-SMM is superior to the existing Bloom Filter based SMM designs in terms of sustainability of the response time with high string storage efficiency and hardware scalability. DBF-SMM is designed for fixed size strings. The second contribution of the thesis is a finite automaton-based design that stores variable size strings as state transitions between characters. To this end, we first identify the classes of state transitions. We then modify the implementation of the well-known Aho-Corasick algorithm to effectively store and query the appropriate transition classes in a hardware architecture that features Bloom Filters and CAM-RAM look-up tables. The Bloom Filter in this architecture is realized as a DBF-SMM. The proposed SMM achieves a memory efficiency that is superior to all previous SMM designs together with fast and scalable hardware design.

Suggestions

A low latency, high throughput and scalable hardware architecture for flow tables in software defined networks
Eral, Göksan; Schmidt, Şenan Ece; Department of Electrical and Electronics Engineering (2016)
Software Defined Networking (SDN) is a new paradigm which requires multi-field packet classification for each received packet by looking up Flow Tables which contain a large number of rules and corresponding actions. The rules are defined by upto 15 packet header fields including IP source and destination address. If more than one rule rule matches then the action of the highest priority rule is executed. Furthermore rules with wildcard fields are possible. The SDN Flow Table should scale with the rule coun...
A Fast and Accurate Hardware String Matching Module with Bloom Filters
Zengin, Salih; Schmidt, Şenan Ece (Institute of Electrical and Electronics Engineers (IEEE), 2017-02-01)
Many fields of computing such as Deep Packet Inspection (DPI) employ string matching modules (SMM) that search for a given set of positive strings in their input. An SMM is expected to produce correct outcomes while scanning the input data at high rates. Furthermore the string sets that are searched for are usually large and their sizes increase steadily. Bloom Filters (BFs) are hashing data structures which are fast but their false positive results require further processing. That is, their speed can be ex...
Multiobjective relational data warehouse design for the cloud
Dökeroğlu, Tansel; Coşar, Ahmet; Department of Computer Engineering (2014)
Conventional distributed DataWarehouse (DW) design techniques seek to assign data tables/fragments to a given static database hardware setting optimally. However; it is now possible to use elastic virtual resources provided by the Cloud environment, thus achieve reductions in both the execution time and the monetary cost of a DW system within predefined budget and response time constraints. Finding an optimal assignment plan for database tables to machines for this design problem is NP-Hard. Therefore, robu...
A High throughput FPGA implementation of markov chain monte carlo method for mixture models
Bozgan, Caner; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2019)
Markov Chain Monte Carlo (MCMC) is a class of algorithms which can generate samples from high dimensional and multimodal probability distributions. In many statistical and control applications, MCMC algorithms are employed widely thanks to their ability to draw sample from arbitrary distribution regardless of dimension or complexity. However, as the complexity of the Bayesian models and the computational load of the MCMC algorithm increase, performing MCMC inference becomes impractical or too time consuming...
Sparse recursive filtering for O 1 stereo matching
Gürbüz, Yeti Ziya; Alatan, Abdullah Aydın (2015-09-30)
Recursive edge-aware filters have been proved to be one of the most efficient approaches for cost aggregation in stereo matching. However, disparity search space dependency, as a result of full search, is the bottle-neck of these local techniques that prevent further reduction in computation. In this paper, the cost aggregation and correspondence search problems are re-formulated to enable adaptive search for each pixel during recursive operations that provides significant reduction in computational complex...
Citation Formats
S. Zengin, “Fast, efficient and dynamically optimized data and hardware architectures for string matching,” Ph.D. - Doctoral Program, Middle East Technical University, 2014.