Building Efficient Regular Expression Matchers Through GA Optimization with ML Surrogates
2021 (English)In: Proceedings of the 2021 12th International Conference on Network of the Future, NoF 2021 / [ed] Machuca, CM; Martins, L;Sargento, S; Wauters, T; Jorge, L ; Chemouil, P Salhab, N;, IEEE, 2021Conference paper, Published paper (Refereed)
Abstract [en]
Important network functions such as traffic classification and intrusion detection often depend on high-throughput regular expression matching. To achieve high performance, regular expressions can be represented as state machines, which are then merged. However, determining which individual state machines should ideally be merged together is a challenging optimization problem. We address this problem by using genetic algorithms with novel problem-specific operators. To allow large scale evaluation of the new operators, we devise two ML-based surrogate models for the expensive fitness evaluation function. Our results from a set of production scale regular expressions show that using the most appropriate operations provides large gains over a naive baseline, but also that no universal best combination of operators exist. We provide some insights into which operators perform best for different objectives, and show the variation between TCP- and UDP-specific regular expressions.
Place, publisher, year, edition, pages
IEEE, 2021.
Keywords [en]
DFA, DPI, Genetic algorithms, IDS, NFA, Regular expressions, Surrogate models, Traffic classification, Function evaluation, Intrusion detection, Network security, Pattern matching, GA optimization, Network functions, State-machine, Surrogate modeling
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-89079DOI: 10.1109/NoF52522.2021.9609828ISI: 000859513400004Scopus ID: 2-s2.0-85123499344ISBN: 9781665424349 (print)ISBN: 978-1-6654-2435-6 (print)OAI: oai:DiVA.org:kau-89079DiVA, id: diva2:1643553
Conference
12th International Conference on Network of the Future, NoF 2021, 6 October 2021 through 8 October 2021
Funder
Swedish Research Council, 2018-059732022-03-102022-03-102022-10-20Bibliographically approved