Change search
Link to record
Permanent link

Direct link
BETA
Korhonen, Topi
Publications (3 of 3) Show all publications
Garcia, J. & Korhonen, T. (2018). Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification. In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML: . Paper presented at 2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary. (pp. 21-27). New York: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification
2018 (English)In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML, New York: Association for Computing Machinery (ACM), 2018, p. 21-27Conference paper, Published paper (Refereed)
Abstract [en]

Flow classification is an important tool to enable efficient network resource usage, support traffic engineering, and aid QoS mechanisms. As traffic is increasingly becoming encrypted by default, flow classification is turning towards the use of machine learning methods employing features that are also available for encrypted traffic. In this work we evaluate flow features that capture the distributional properties of in-flow per-packet metrics such as packet size and inter-arrival time. The characteristics of such distributions are often captured with general statistical measures such as standard deviation, variance, etc. We instead propose a Kolmogorov-Smirnov discretization (KSD) algorithm to perform histogram bin construction based on the distributional properties observed in the data. This allows for a richer, histogram based, representation which also requires less resources for feature computation than higher order statistical moments. A comprehensive evaluation using synthetic data from Gaussian and Beta mixtures show that the KSD approach provides Jensen-Shannon distance results surpassing those of uniform binning and probabilistic binning. An empirical evaluation using live traffic traces from a cellular network further shows that when coupled with a random forest classifier the KSD-constructed features improve classification performance compared to general statistical features based on higher order moments, or alternative bin placement approaches.

Place, publisher, year, edition, pages
New York: Association for Computing Machinery (ACM), 2018
Keywords
Traffic classification, Discretization, Machine learning
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68707 (URN)10.1145/3229543.3229548 (DOI)978-1-4503-5911-5 (ISBN)
Conference
2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-11-08Bibliographically approved
Garcia, J. & Korhonen, T. (2018). On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach. Performance Evaluation Review, 46(3), 167-170
Open this publication in new window or tab >>On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach
2018 (English)In: Performance Evaluation Review, ISSN 0163-5999, E-ISSN 1557-9484, Vol. 46, no 3, p. 167-170Article in journal (Refereed) Published
Abstract [en]

Using machine learning in high-speed networks for tasks such as flow classification typically requires either very resource efficient classification approaches, large amounts of computational resources, or specialized hardware. Here we provide a sketch of the discretize-optimize (DISCO) approach which can construct an extremely efficient classifier for low dimensional problems by combining feature selection, efficient discretization, novel bin placement, and lookup. As feature selection and discretization parameters are crucial, appropriate combinatorial optimization is an important aspect of the approach. A performance evaluation is performed for a YouTube classification task using a cellular traffic data set. The initial evaluation results show that the DISCO approach can move the Pareto boundary in the classification performance versus runtime trade-off by up to an order of magnitude compared to runtime optimized random forest and decision tree classifiers.

Place, publisher, year, edition, pages
New york, USA: Association for Computing Machinery (ACM), 2018
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-71213 (URN)10.1145/3308897.3308965 (DOI)
Projects
HITS, 4707
Funder
Knowledge Foundation
Available from: 2019-02-20 Created: 2019-02-20 Last updated: 2019-11-08Bibliographically approved
Garcia, J., Korhonen, T., Andersson, R. & Västlund, F. (2018). Towards Video Flow Classification at a Million Encrypted Flows Per Second. In: Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid (Ed.), Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA): . Paper presented at 32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.. Krakow: IEEE
Open this publication in new window or tab >>Towards Video Flow Classification at a Million Encrypted Flows Per Second
2018 (English)In: Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA) / [ed] Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid, Krakow: IEEE, 2018Conference paper, Published paper (Refereed)
Abstract [en]

As end-to-end encryption on the Internet is becoming more prevalent, techniques such as deep packet inspection (DPI) can no longer be expected to be able to classify traffic. In many cellular networks a large fraction of all traffic is video traffic, and being able to divide flows in the network into video and non-video can provide considerable traffic engineering benefits. In this study we examine machine learning based flow classification using features that are available also for encrypted flows. Using a data set of several several billion packets from a live cellular network we examine the obtainable classification performance for two different ensemble-based classifiers. Further, we contrast the classification performance of a statistical-based feature set with a less computationally demanding alternate feature set. To also examine the runtime aspects of the problem, we export the trained models and use a tailor-made C implementation to evaluate the runtime performance. The results quantify the trade-off between classification and runtime performance, and show that up to 1 million classifications per second can be achieved for a single core. Considering that only the subset of flows reaching some minimum flow length will need to be classified, the results are promising with regards to deployment also in scenarios with very high flow arrival rates.

Place, publisher, year, edition, pages
Krakow: IEEE, 2018
Series
Advanced Information Networking and Applications, ISSN 1550-445X, E-ISSN 2332-5658
Keywords
Cryptography, Runtime, Cellular networks, Machine learning, Forestry, Data models, Support vector machines
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68705 (URN)10.1109/AINA.2018.00061 (DOI)000454817500048 ()978-1-5386-2196-7 (ISBN)978-1-5386-2195-0 (ISBN)
Conference
32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-02-14Bibliographically approved
Organisations

Search in DiVA

Show all publications