Change search
Link to record
Permanent link

Direct link
BETA
Alternative names
Publications (10 of 90) Show all publications
Afzal, Z., Garcia, J., Lindskog, S. & Brunström, A. (2019). Using Partial Signatures in Intrusion Detection for Multipath TCP. In: : . Paper presented at 24th Nordic Conference, NordSec 2019, Aalborg, Denmark, November 18–20, 2019.
Open this publication in new window or tab >>Using Partial Signatures in Intrusion Detection for Multipath TCP
2019 (English)Conference paper, Published paper (Refereed)
National Category
Engineering and Technology
Research subject
Computer Science; Computer Science
Identifiers
urn:nbn:se:kau:diva-75755 (URN)10.1007/978-3-030-35055-0 (DOI)
Conference
24th Nordic Conference, NordSec 2019, Aalborg, Denmark, November 18–20, 2019
Available from: 2019-11-14 Created: 2019-11-14 Last updated: 2019-11-14
Garcia, J. (2018). A Fragment Hashing Approach for Scalable and Cloud-Aware Network File Detection. In: Proceedings of NTMS 2018 Conference and Workshop: . Paper presented at 2018 9th IFIP International Conference on New Technologies, Mobility & Security, 26-28 February 2018, Paris, France (pp. 1-5). New York: IEEE
Open this publication in new window or tab >>A Fragment Hashing Approach for Scalable and Cloud-Aware Network File Detection
2018 (English)In: Proceedings of NTMS 2018 Conference and Workshop, New York: IEEE, 2018, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

Monitoring networks for the presence of some particular set of files can, for example, be important in order to avoid exfiltration of sensitive data, or combat the spread of Child Sexual Abuse (CSA) material. This work presents a scalable system for large-scale file detection in high-speed networks. A multi-level approach using packet sampling with rolling and block hashing is introduced. We show that such approach together with a well tuned implementation can perform detection of a large number of files on the network at 10 Gbps using standard hardware. The use of packet sampling enables easy distribution of the monitoring processing functionality, and allows for flexible scaling in a cloud environment. Performance experiments on the most run-time critical hashing parts shows a single-thread performance consistent with 10Gbps line rate monitoring. The file detectability is examined for three data sets over a range of packet sampling rates. A conservative sampling rate of 0.1 is demonstrated to perform well for all tested data sets. It is also shown that knowledge of the file size distribution can be exploited to allow lower sampling rates to be configured for two of the data sets, which in turn results in lower resource usage.

Place, publisher, year, edition, pages
New York: IEEE, 2018
Keywords
Monitoring, Databases, Metadata, Hardware, Throughput, Forensics, System analysis and design
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-67375 (URN)10.1109/NTMS.2018.8328746 (DOI)000448864200076 ()978-1-5386-3662-6 (ISBN)978-1-5386-3663-3 (ISBN)
Conference
2018 9th IFIP International Conference on New Technologies, Mobility & Security, 26-28 February 2018, Paris, France
Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2019-06-17Bibliographically approved
Garcia, J. & Brunström, A. (2018). Clustering-based separation of media transfers in DPI-classified cellular video and VoIP traffic. In: 2018 IEEE Wireless Communications and Networking Conference (WCNC): . Paper presented at 2018 IEEE Wireless Communications and Networking Conference (WCNC), 15-18 April 2018, Barcelona, Spain.. IEEE
Open this publication in new window or tab >>Clustering-based separation of media transfers in DPI-classified cellular video and VoIP traffic
2018 (English)In: 2018 IEEE Wireless Communications and Networking Conference (WCNC), IEEE, 2018Conference paper, Published paper (Refereed)
Abstract [en]

Identifying VoIP and video traffic is often useful in the context of managing a cellular network, and to perform such traffic classification deep packet inspection (DPI) approaches are often used. Commercial DPI classifiers do not necessarily differentiate between, for example, YouTube traffic that arises from browsing inside the YouTube app, and traffic arising from the actual viewing of a YouTube video. Here we apply unsupervised clustering methods on such cellular DPI-labeled VoIP and video traffic to identify the characteristic behavior of the two sub-groups of media-transfer and non media-transfer flows. The analysis is based on a measurement campaign performed inside the core network of a commercial cellular operator, collecting data for more than two billion packets in 40+ million flows. A specially instrumented commercial DPI appliance allows the simultaneous collection of per packet information in addition to the DPI classification output. We show that the majority of flows falls into clusters that are easily identifiable as belonging to one of the traffic sub-groups, and that a surprising majority of DPIlabeled VoIP and video traffic is non-media related.

Place, publisher, year, edition, pages
IEEE, 2018
Series
IEEE Wireless Communications and Networking Conference. Proceedings, ISSN 1525-3511, E-ISSN 1558-2612
Keywords
Media, YouTube, Clustering algorithms, Cryptography, Downlink, Engines, Uplink
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-67798 (URN)10.1109/WCNC.2018.8377027 (DOI)000435542400081 ()978-1-5386-1734-2 (ISBN)978-1-5386-1735-9 (ISBN)
Conference
2018 IEEE Wireless Communications and Networking Conference (WCNC), 15-18 April 2018, Barcelona, Spain.
Projects
HITS
Available from: 2018-06-19 Created: 2018-06-19 Last updated: 2019-04-05Bibliographically approved
Garcia, J. (2018). Duplications and Misattributions of File Fragment Hashes in Image and Compressed Files. In: Proceedings of NTMS 2018 Conference and Workshop: . Paper presented at 2018 9th IFIP International Conference on New Technologies, Mobility and Security, February 26-28, Paris, France (pp. 1-5). New York: IEEE
Open this publication in new window or tab >>Duplications and Misattributions of File Fragment Hashes in Image and Compressed Files
2018 (English)In: Proceedings of NTMS 2018 Conference and Workshop, New York: IEEE, 2018, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

Hashing is used in a wide variety of security contexts. Hashes of parts of files, fragment hashes, can be used to detect remains of deleted files in cluster slack, to detect illicit files being sent over a network, to perform approximate file matching, or to quickly scan large storage devices using sector sampling. In this work we examine the fragment hash uniqueness and hash duplication characteristics of five different data sets with a focus on JPEG images and compressed file archives. We consider both block and rolling hashes and evaluate sizes of the hashed fragments ranging from 16 to 4096 bytes. During an initial hash generation phase hash metadata is created for each data set, which in total becomes several several billion hashes. During the scan phase each other data set is scanned and hashes checked for potential matches in the hash metadata. Three aspects of fragment hashes are examined: 1) the rate of duplicate hashes within each data set, 2) the rate of hash misattribution where a fragment hash from the scanned data set matches a fragment in the hash metadata although the actual file is not present in the scan set, 3) to what extent it is possible to detect fragments from files in a hashed set when those files have been compressed and embedded in a zip archive. The results obtained are useful as input to dimensioning and evaluation procedures for several application areas of fragment hashing.

Place, publisher, year, edition, pages
New York: IEEE, 2018
Keywords
Metadata, Transform coding, Forensics, Image coding, Security, Entropy, Focusing
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-67376 (URN)10.1109/NTMS.2018.8328690 (DOI)000448864200021 ()978-1-5386-3662-6 (ISBN)978-1-5386-3663-3 (ISBN)
Conference
2018 9th IFIP International Conference on New Technologies, Mobility and Security, February 26-28, Paris, France
Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2019-06-17Bibliographically approved
Garcia, J. & Korhonen, T. (2018). Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification. In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML: . Paper presented at 2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary. (pp. 21-27). New York: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification
2018 (English)In: NetAI'18 Proceedings of the 2018 Workshop on Network Meets AI & ML, New York: Association for Computing Machinery (ACM), 2018, p. 21-27Conference paper, Published paper (Refereed)
Abstract [en]

Flow classification is an important tool to enable efficient network resource usage, support traffic engineering, and aid QoS mechanisms. As traffic is increasingly becoming encrypted by default, flow classification is turning towards the use of machine learning methods employing features that are also available for encrypted traffic. In this work we evaluate flow features that capture the distributional properties of in-flow per-packet metrics such as packet size and inter-arrival time. The characteristics of such distributions are often captured with general statistical measures such as standard deviation, variance, etc. We instead propose a Kolmogorov-Smirnov discretization (KSD) algorithm to perform histogram bin construction based on the distributional properties observed in the data. This allows for a richer, histogram based, representation which also requires less resources for feature computation than higher order statistical moments. A comprehensive evaluation using synthetic data from Gaussian and Beta mixtures show that the KSD approach provides Jensen-Shannon distance results surpassing those of uniform binning and probabilistic binning. An empirical evaluation using live traffic traces from a cellular network further shows that when coupled with a random forest classifier the KSD-constructed features improve classification performance compared to general statistical features based on higher order moments, or alternative bin placement approaches.

Place, publisher, year, edition, pages
New York: Association for Computing Machinery (ACM), 2018
Keywords
Traffic classification, Discretization, Machine learning
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68707 (URN)10.1145/3229543.3229548 (DOI)978-1-4503-5911-5 (ISBN)
Conference
2018 Workshop on Network Meets AI & ML. August 24 - 24, 2018. Budapest, Hungary.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-11-08Bibliographically approved
Garcia, J. & Korhonen, T. (2018). On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach. Performance Evaluation Review, 46(3), 167-170
Open this publication in new window or tab >>On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach
2018 (English)In: Performance Evaluation Review, ISSN 0163-5999, E-ISSN 1557-9484, Vol. 46, no 3, p. 167-170Article in journal (Refereed) Published
Abstract [en]

Using machine learning in high-speed networks for tasks such as flow classification typically requires either very resource efficient classification approaches, large amounts of computational resources, or specialized hardware. Here we provide a sketch of the discretize-optimize (DISCO) approach which can construct an extremely efficient classifier for low dimensional problems by combining feature selection, efficient discretization, novel bin placement, and lookup. As feature selection and discretization parameters are crucial, appropriate combinatorial optimization is an important aspect of the approach. A performance evaluation is performed for a YouTube classification task using a cellular traffic data set. The initial evaluation results show that the DISCO approach can move the Pareto boundary in the classification performance versus runtime trade-off by up to an order of magnitude compared to runtime optimized random forest and decision tree classifiers.

Place, publisher, year, edition, pages
New york, USA: Association for Computing Machinery (ACM), 2018
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-71213 (URN)10.1145/3308897.3308965 (DOI)
Projects
HITS, 4707
Funder
Knowledge Foundation
Available from: 2019-02-20 Created: 2019-02-20 Last updated: 2019-11-08Bibliographically approved
Afzal, Z., Garcia, J., Lindskog, S. & Brunström, A. (2018). Slice Distance: An Insert-Only Levenshtein Distance with a Focus on Security Applications. In: Proceedings of NTMS 2018 Conference and Workshop: . Paper presented at 9th IFIP International Conference on New Technologies, Mobility and Security, 26-28 February 2018, Paris, France (pp. 1-5). New York: IEEE
Open this publication in new window or tab >>Slice Distance: An Insert-Only Levenshtein Distance with a Focus on Security Applications
2018 (English)In: Proceedings of NTMS 2018 Conference and Workshop, New York: IEEE, 2018, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

Levenshtein distance is well known for its use in comparing two strings for similarity. However, the set of considered edit operations used when comparing can be reduced in a number of situations. In such cases, the application of the generic Levenshtein distance can result in degraded detection and computational performance. Other metrics in the literature enable limiting the considered edit operations to a smaller subset. However, the possibility where a difference can only result from deleted bytes is not yet explored. To this end, we propose an insert-only variation of the Levenshtein distance to enable comparison of two strings for the case in which differences occur only because of missing bytes. The proposed distance metric is named slice distance and is formally presented and its computational complexity is discussed. We also provide a discussion of the potential security applications of the slice distance.

Place, publisher, year, edition, pages
New York: IEEE, 2018
Keywords
Measurement, Pattern matching, Time complexity, Transforms, Security, DNA
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-67012 (URN)10.1109/NTMS.2018.8328718 (DOI)000448864200049 ()978-1-5386-3662-6 (ISBN)978-1-5386-3663-3 (ISBN)
Conference
9th IFIP International Conference on New Technologies, Mobility and Security, 26-28 February 2018, Paris, France
Projects
HITS, 4707
Funder
Knowledge Foundation, 4707
Available from: 2018-04-17 Created: 2018-04-17 Last updated: 2019-11-11Bibliographically approved
Garcia, J., Korhonen, T., Andersson, R. & Västlund, F. (2018). Towards Video Flow Classification at a Million Encrypted Flows Per Second. In: Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid (Ed.), Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA): . Paper presented at 32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.. Krakow: IEEE
Open this publication in new window or tab >>Towards Video Flow Classification at a Million Encrypted Flows Per Second
2018 (English)In: Proceedings of 32nd International Conference on Advanced Information Networking and Applications (AINA) / [ed] Leonard Barolli, Makoto Takizawa, Tomoya Enokido, Marek R. Ogiela, Lidia Ogiela & Nadeem Javaid, Krakow: IEEE, 2018Conference paper, Published paper (Refereed)
Abstract [en]

As end-to-end encryption on the Internet is becoming more prevalent, techniques such as deep packet inspection (DPI) can no longer be expected to be able to classify traffic. In many cellular networks a large fraction of all traffic is video traffic, and being able to divide flows in the network into video and non-video can provide considerable traffic engineering benefits. In this study we examine machine learning based flow classification using features that are available also for encrypted flows. Using a data set of several several billion packets from a live cellular network we examine the obtainable classification performance for two different ensemble-based classifiers. Further, we contrast the classification performance of a statistical-based feature set with a less computationally demanding alternate feature set. To also examine the runtime aspects of the problem, we export the trained models and use a tailor-made C implementation to evaluate the runtime performance. The results quantify the trade-off between classification and runtime performance, and show that up to 1 million classifications per second can be achieved for a single core. Considering that only the subset of flows reaching some minimum flow length will need to be classified, the results are promising with regards to deployment also in scenarios with very high flow arrival rates.

Place, publisher, year, edition, pages
Krakow: IEEE, 2018
Series
Advanced Information Networking and Applications, ISSN 1550-445X, E-ISSN 2332-5658
Keywords
Cryptography, Runtime, Cellular networks, Machine learning, Forestry, Data models, Support vector machines
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-68705 (URN)10.1109/AINA.2018.00061 (DOI)000454817500048 ()978-1-5386-2196-7 (ISBN)978-1-5386-2195-0 (ISBN)
Conference
32nd International Conference on Advanced Information Networking and Applications (AINA). Krakow, Poland, 16-18 May 2018.
Projects
HITS
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2019-02-14Bibliographically approved
Garcia, J. (2017). A clustering-based analysis of DPI-labeled video flow characteristics in cellular networks. In: Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network Management: . Paper presented at Integrated Network and Service Management (IM), 2017 IFIP/IEEE Symposium 8-12 May 2017, Lisbon, Portugal (pp. 1-4). New York: IEEE
Open this publication in new window or tab >>A clustering-based analysis of DPI-labeled video flow characteristics in cellular networks
2017 (English)In: Proceedings of the IM 2017 - 2017 IFIP/IEEE International Symposium on Integrated Network Management, New York: IEEE, 2017, p. 1-4Conference paper, Published paper (Refereed)
Abstract [en]

Using a specially instrumented deep packet inspection (DPI) appliance placed inside the core network of a commercial cellular operator we collect data from almost four million flows produced by a `heavy-hitter' subset of the customer base. The data contains per packet information for the first 100 packets in each flow, along with the classification done by the DPI engine. The data is used with unsupervised learning to obtain clusters of typical video flow behaviors, with the intent to quantify the number of such clusters and examine their characteristics. Among the flows identified as belonging to video applications by the DPI engine, a subset are actually video application signaling flows or other flows not carrying actual transfers of video data. Given that DPI-labeled data can be used to train supervised machine learning models to identify flows carrying video transfers in encrypted traffic, the potential presence and structure of such `noise' flows in the ground truth is important to examine. In this study K-means and DBSCAN is used to cluster the flows marked by the DPI engine as being from a video application. The clustering techniques identify a set of 4 to 6 clusters with archetypal flow behaviors, and a subset of these clusters are found to represent flows that are not actually transferring video data.

Place, publisher, year, edition, pages
New York: IEEE, 2017
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kau:diva-64540 (URN)10.23919/INM.2017.7987420 (DOI)978-3-901882-89-0 (ISBN)
Conference
Integrated Network and Service Management (IM), 2017 IFIP/IEEE Symposium 8-12 May 2017, Lisbon, Portugal
Projects
HITS
Funder
Knowledge Foundation, 4707
Available from: 2017-10-13 Created: 2017-10-13 Last updated: 2019-06-17Bibliographically approved
Jalili, L., Parichehreh, A., Alfredsson, S., Garcia, J. & Brunström, A. (2017). Efficient traffic offloading for seamless connectivity in 5G networks onboard high speed trains. In: IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC: . Paper presented at 28th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2017, 8-13 October 2017, Montreal, Canada (pp. 1-6). IEEE
Open this publication in new window or tab >>Efficient traffic offloading for seamless connectivity in 5G networks onboard high speed trains
Show others...
2017 (English)In: IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, IEEE, 2017, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

Seamless wireless connectivity in high mobility scenarios (≥ 300 km/h), is one of the fundamental key requirements for the future 5G networks. High speed train (HST) is one of the preferred mid-range transportation systems, and highlights the challenges of providing wireless connectivity in high mobility scenarios for the 5G networks. Advanced version of Long Term Evolution (LTE-A) from the Third Generation Partnership Project (3GPP) with peak data rate up to 100 Mbps in high mobility scenarios paved the road toward high quality and cost effective onboard Internet in HSTs. However, frequent handovers (HO) of large number of onboard users increase the service interruptions that in turn inevitably decrease the experienced quality of service (QoS). In this paper, according to the two-tier architecture of the HST wireless connectivity, we propose a novel and practically viable onboard traffic offloading mechanism among the HST carriages that effectively mitigates the service interruptions caused by frequent HOs of massive number of onboard users. The proposed architecture does not imply any change on the LTE network standardization. Conclusions are supported by numerical results for realistic LTE parameters and current HST settings.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications workshops, ISSN 2166-9570, E-ISSN 2166-9589
Keywords
5G networks, High speed trains, QoS provisioning, Traffic offloading, Cost effectiveness, Long Term Evolution (LTE), Mobile telecommunication systems, Network architecture, Quality of service, Queueing networks, Radio communication, Railroad cars, Railroad transportation, Railroads, Wireless telecommunication systems, High speed train (HST), Proposed architectures, Seamless connectivity, Third generation partnership project (3GPP), Two-tier architecture, Wireless connectivities, 5G mobile communication systems
National Category
Telecommunications Computer Sciences Software Engineering
Identifiers
urn:nbn:se:kau:diva-67276 (URN)10.1109/PIMRC.2017.8292462 (DOI)000426970901150 ()2-s2.0-85045264762 (Scopus ID)9781538635315 (ISBN)
Conference
28th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2017, 8-13 October 2017, Montreal, Canada
Projects
HITS, 4707
Funder
Knowledge Foundation
Available from: 2018-05-04 Created: 2018-05-04 Last updated: 2019-11-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3461-7079

Search in DiVA

Show all publications