A Novel Flow-level Session Descriptor with Application to OS and Browser Identification
2020 (English)In: Proceedings of IEEE/IFIP Network Operations and Management Symposium 2020: Management in the Age of Softwarization and Artificial Intelligence, NOMS 2020, IEEE, 2020Conference paper, Published paper (Refereed)
Abstract [en]
High level traffic characteristics have the potential to be useful for inference of various host characteristics. This work proposes the novel Flow-Discretize Order (FDO) approach for describing session characteristics in an intuitive manner, while also retaining flow ordering information. The FDO approach allows for flexible construction of flow descriptors, by using different flow properties and applying appropriate discretization. The individual flow descriptors are concatenated to form session descriptors. By utilizing string distance metrics, such as the Damerau-Levenshtein distance (DLD), it is possible to perform both unsupervised and supervised learning on the FDO session descriptors. Here, we utilize FDO as a tool for OS and browser identification coupled to a particular user activity, in this case watching YouTube videos. The variable-length nature of FDO session descriptors precludes learning methods expecting fixed dimensionality from being used. However, experiments show that excellent performance are provided by methods operating on distances such as hierarchical Ward for the unsupervised case, and k-NN for the supervised case. The supervised learning evaluation shows that over 99% accuracy can be achieved for both operating system and browser identification based on video session characteristics. The FDO framework also provides multiple promising avenues for further research and improvements such as improved methods for discretization boundary placement, more elaborate feature selection approaches, and more fine-grained DLD weights.
Place, publisher, year, edition, pages
IEEE, 2020.
Keywords [en]
Nearest neighbor search, Supervised learning, Boundary placements, Distance metrics, Flexible construction, Flow properties, Learning methods, Levenshtein distance, Traffic characteristics, Variable length, Learning systems
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-80790DOI: 10.1109/NOMS47738.2020.9110374Scopus ID: 2-s2.0-85086760451ISBN: 9781728149738 (electronic)ISBN: 978-1-7281-4974-5 (print)OAI: oai:DiVA.org:kau-80790DiVA, id: diva2:1475771
Conference
2020 IEEE/IFIP Network Operations and Management Symposium, NOMS 2020, 20 April 2020 through 24 April 2020
Note
ACKNOWLEDGMENTS The authors wish to thank Sandvine for assisting with data collection. Funding for this study was provided by the HITS project grant from the Swedish Knowledge Foundation.
2020-10-132020-10-132022-01-25Bibliographically approved