Levenshtein distance is well known for its use in comparing two strings for similarity. However, the set of considered edit operations used when comparing can be reduced in a number of situations. In such cases, the application of the generic Levenshtein distance can result in degraded detection and computational performance. Other metrics in the literature enable limiting the considered edit operations to a smaller subset. However, the possibility where a difference can only result from deleted bytes is not yet explored. To this end, we propose an insert-only variation of the Levenshtein distance to enable comparison of two strings for the case in which differences occur only because of missing bytes. The proposed distance metric is named slice distance and is formally presented and its computational complexity is discussed. We also provide a discussion of the potential security applications of the slice distance.
Traditional security mechanisms such as signature basedintrusion detection systems (IDSs) attempt to find a perfect match of aset of signatures in network traffic. Such IDSs depend on the availabilityof a complete application data stream. With emerging protocols such asMultipath TCP (MPTCP), this precondition cannot be ensured, result-ing in false negatives and IDS evasion. On the other hand, if approximatesignature matching is used instead in an IDS, a potentially high numberof false positives make the detection impractical. In this paper, we showthat, by using a specially tailored partial signature matcher and knowl-edge about MPTCP semantics, the Snort3 IDS can be empowered withpartial signature detection. Additionally, we uncover the type of Snort3rules suitable for the task of partial matching. Experimental results withthese rules show a low false positive rate for benign traffic and highdetection coverage for attack traffic.
The existence of excessively large and too filled network buffers, known as bufferbloat, has recently gained attention as a major performance problem for delay-sensitive applications. One important network scenario where bufferbloat may occur is cellular networks.
This paper investigates the interaction between TCP congestion control and buffering in cellular networks. Extensive measurements have been performed in commercial 3G, 3.5G and 4G cellular networks, with a mix of long and short TCP flows using the CUBIC, NewReno and Westwood+ congestion control algorithms. The results show that the completion times of short flows increase significantly when concurrent long flow traffic is introduced. This is caused by increased buffer occupancy from the long flows. In addition, for 3G and 3.5G the completion times are shown to depend significantly on the congestion control algorithms used for the background flows, with CUBIC leading to significantly larger completion times.
We provide measured data collected from 97 trains completing over 7000 journeys in Sweden showing that the throughput over LTE is impacted by train velocity. In order to explain these observations we assume that the underlying causes can be found in the implementation of the MIMO system into LTE Rel. 8 and the diffuse scattering of signals from ground reflections.
During the first phase of NEWCOM the focus areas of Department 6 were identified and refined. A number of relevant knowledge gaps were identified for the areas transport protocols, architectures and cross-layer aspects, and modelling. In this deliverable we describe a first set of frameworks/models to support research integration within the Department. The integration approach and the defined models/frameworks are described for each one of the selected knowledge gaps. The deliverable also includes a report on tools, software libraries and traces that can be shared between the partners
The Second Newcom Department 6 Technical Workshop was organized in Barcelona on September 16-17, 2005. The workshop program contained 6 presentations and provided a good overview of ongoing research integration activities within the department. All of the three areas of the department, transport protocols, architectures and cross-layer aspects, and modelling were represented with presentations. This deliverable contains the presentation material from the workshop. The included presentations are:- Westwood-SCTP: A Transport Protocol for Traffic Balancing on Multihomed Hosts- Transport Layer Handover using SCTP- The Optimization of Transport Protocol over Ad-Hoc Networks- Wireless Networks Emulation- An Analytical Model of Rate-Controlled MPEG Video Sources in a UMTS Network- An Analytical Model of a Rate-controlled MPEG-4 Video Source Capturing both Intra-frame and Inter-frame CorrelationAs an option, a supporting paper for the presentation could also be supplied by the authors. Thedeliverable contains supporting articles for two of the presentations
Work within Department 6 of NEWCOM is organized into the areas transport protocols, architectures and cross-layer aspects, and modelling. In this deliverable we provide a second report on the frameworks/models used to support research integration within the Department. The integration approach and the defined models/frameworks are described for each one of the three areas of the department. The deliverable also includes an updated report on tools, software libraries and traces that can be shared between the partners
The Third Newcom Department 6 Technical Workshop was organized in Catania, Italy, on February 2, 2006. The workshop program contained 4 presentations and contained reports on ongoing integrated research activities as well as presentation intended to initiate additional joint research within the department. All of the three areas of the department, transport protocols, architectures and cross-layeraspects, and modelling were represented with presentations. This deliverable contains the presentation material from the workshop. The included presentations are:- P2P-based Video transmission in wireless networks- Transport Layer Handover using SCTP- WIPEMU 4G System Emulation and Sample Results- Wireless Networks EmulationWhere available the presentation notes are also included with the presentations
The Fourth Newcom Department 6 Technical Workshop was organized in Toulouse, France, on September 13-14, 2006. The workshop program contained 6 presentations and contained reports on ongoing integrated research activities as well as presentation intended to initiate additional joint research activities between the partners. All of the three areas of the department, transport protocols, architectures and cross-layer aspects, and modelling were represented with presentations. This deliverable contains the presentation material from the workshop. The included presentations are:- P2P Video Transmission over Heterogeneous Wired/Wireless Networks: A Starting Point for Integrated Research- DCCP Overview and First Experiments- Estimation of the SCTP Failover Time- Improving End to End Goodput of Ad Hoc Networks with SCTP Multihoming- A Taxonomy and Survey of SCTP Research- Integrating KAUnet and SWINEWhere available the presentation notes are also included with the presentations
The Stream Control Transmission Protocol (SCTP) is a relatively recent general-purpose transport layer protocol for IP networks that has been introduced as a complement to the well-established TCP and UDP transport protocols. Although initially conceived for the transport of PSTN signaling messages over IP networks, the introduction of key features in SCTP, such as multihoming and multistreaming, has spurred considerable research interest surrounding SCTP and its applicability to different networking scenarios. This article aims to provide a detailed survey of one of these new features—multihoming—which, as it is shown, is the subject of evaluation in more than half of all published SCTP-related articles. To this end, the article first summarizes and organizes SCTP-related research conducted so far by developing a four-dimensional taxonomy reflecting the (1) protocol feature examined, (2) application area, (3) network environment, and (4) study approach. Over 430 SCTP-related publications have been analyzed and classified according to the proposed taxonomy. As a result, a clear perspective on this research area in the decade since the first protocol standardization in 2000 is given, covering both current and future research trends. On continuation, a detailed survey of the SCTP multihoming feature is provided, examining possible applications of multihoming, such as robustness, handover support, and loadsharing.
More and more applications and protocols are now running on wireless networks. Testing the implementation of such applications and protocols is a real challenge as the position of the mobile terminals and environmental effects strongly affect the overall performance. Network emulation is often perceived as a good trade-off between experiments on operational wireless networks and discrete-event simulations on Opnet or ns-2. However, ensuring repeatability and realism in network emulation while taking into account mobility in a wireless environment is very difficult. This paper proposes a network emulation platform, called W-NINE, based on off-line computations preceding online pattern-based traffic shaping. The underlying concepts of repeatability, dynamicity, accuracy, and realism are defined in the emulation context. Two different simple case studies illustrate the validity of our approach with respect to these concepts.
Using a specially instrumented deep packet inspection (DPI) appliance placed inside the core network of a commercial cellular operator we collect data from almost four million flows produced by a `heavy-hitter' subset of the customer base. The data contains per packet information for the first 100 packets in each flow, along with the classification done by the DPI engine. The data is used with unsupervised learning to obtain clusters of typical video flow behaviors, with the intent to quantify the number of such clusters and examine their characteristics. Among the flows identified as belonging to video applications by the DPI engine, a subset are actually video application signaling flows or other flows not carrying actual transfers of video data. Given that DPI-labeled data can be used to train supervised machine learning models to identify flows carrying video transfers in encrypted traffic, the potential presence and structure of such `noise' flows in the ground truth is important to examine. In this study K-means and DBSCAN is used to cluster the flows marked by the DPI engine as being from a video application. The clustering techniques identify a set of 4 to 6 clusters with archetypal flow behaviors, and a subset of these clusters are found to represent flows that are not actually transferring video data.
Monitoring networks for the presence of some particular set of files can, for example, be important in order to avoid exfiltration of sensitive data, or combat the spread of Child Sexual Abuse (CSA) material. This work presents a scalable system for large-scale file detection in high-speed networks. A multi-level approach using packet sampling with rolling and block hashing is introduced. We show that such approach together with a well tuned implementation can perform detection of a large number of files on the network at 10 Gbps using standard hardware. The use of packet sampling enables easy distribution of the monitoring processing functionality, and allows for flexible scaling in a cloud environment. Performance experiments on the most run-time critical hashing parts shows a single-thread performance consistent with 10Gbps line rate monitoring. The file detectability is examined for three data sets over a range of packet sampling rates. A conservative sampling rate of 0.1 is demonstrated to perform well for all tested data sets. It is also shown that knowledge of the file size distribution can be exploited to allow lower sampling rates to be configured for two of the data sets, which in turn results in lower resource usage.
Novel lookup-based classification approaches allow machine-learning (ML) to be performed at extremely high classification rates for suitable low-dimensional classification problems. A central aspect of such approaches is the crucial importance placed on the optimal selection of features and discretized feature representations. In this work we propose and study a hybrid-genetic algorithm (hGAm) approach to solve this optimization problem. For the considered problem the fitness evaluation function is expensive, as it entails training a ML classifier with the proposed set of features and representations, and then evaluating the resulting classifier. We have here devised a surrogate problem by casting the feature selection and representation problem as a combinatorial optimization problem in the form of a multiple-choice quadratic knapsack problem (MCQKP). The orders of magnitude faster evaluation of the surrogate problem allows a comprehensive hGAm performance evaluation to be performed. The results show that a suitable trade-off exists at around 5000 fitness evaluations, and the results also provide a characterization of the parameter behaviors as input to future extensions.
Investigations involving digital forensics typically include file hash matching procedures at one or more steps in the examination. File hash matching is typically done by computing a complete file hash value for each file on a storage device and comparing that to a pre-computed hash list. This work examines how various improvements to the basic technique impact the time required to perform hash matching. Specifically, side-information assisted approaches are evaluated in this work. By utilizing side-information such as file sizes and pre-hashes in addition to the traditional hash values, we find that it is possible to considerably decrease the amount of time required to perform file hash matching. A simulation model is used to evaluate the potential time saving over a range of storage devices and using five different empirically derived file size distribution datasets totaling 36 million file sizes. The results indicate that side-information assisted hashing provides a considerable reduction of the time required, ranging between 5% and 99%, with the majority of cases providing reductions with more than 50%.
The detailed performance characteristics of networking equipment is to a large extent a function of the software that controls the underlying hardware components. Most networking equipment is regularly updated with new software versions. By studying performance changes related to such changes in software, it is possible to identify particular software versions that affect the performance of the system. Consequently, having automated methods for detecting changes in network equipment performance is crucial. In this work we study the change point detection problem arising when the placement in time of software updates is known a priori, but the presence of any performance implications on any of the thousands of performance indicators that can be collected is unknown. The ability to improve the automated detection of such change points by clustering the monitored systems according to the set of collected indicators has not been fully evaluated. We here report our experience with employing clustering, together with a bootstrap-based change point detection, across a range of performance indicators. We evaluate four variations of clustering approaches, and demonstrate the resulting improvement in change point detection sensitivity.
Efficient operation of networking systems is important from resource utilization, OPEX, and energy consumption perspectives. A major factor in efficient operations is the underlying software that controls the networking hardware or virtualized network functions. Most software in hardware-based networking devices is periodically updated, which may or may not have impact on various aspects of the performance of the device. We consider the issue of change point detection in network performance indicators, aiming to detect when such software updates co-occur with changes to any subset of collected performance metrics. In particular, we study the change point detection problem that arises when the placement in time of firmware changes is known a priori, but the presence of any performance implications is unknown. We focus on evaluating change point detection in operational network equipment log data, and consider diurnal variation suppression approaches. We propose the use of periodicity filtering to remove anomalous data sources, and apply a resampling technique using bootstrapping to determine when a software update has performance implications. Our results show that this automated change point detection approach can locate performance-related changes, and that load normalization appears to be the most sensitive approach to diurnal variation suppression.
Hashing is used in a wide variety of security contexts. Hashes of parts of files, fragment hashes, can be used to detect remains of deleted files in cluster slack, to detect illicit files being sent over a network, to perform approximate file matching, or to quickly scan large storage devices using sector sampling. In this work we examine the fragment hash uniqueness and hash duplication characteristics of five different data sets with a focus on JPEG images and compressed file archives. We consider both block and rolling hashes and evaluate sizes of the hashed fragments ranging from 16 to 4096 bytes. During an initial hash generation phase hash metadata is created for each data set, which in total becomes several several billion hashes. During the scan phase each other data set is scanned and hashes checked for potential matches in the hash metadata. Three aspects of fragment hashes are examined: 1) the rate of duplicate hashes within each data set, 2) the rate of hash misattribution where a fragment hash from the scanned data set matches a fragment in the hash metadata although the actual file is not present in the scan set, 3) to what extent it is possible to detect fragments from files in a hashed set when those files have been compressed and embedded in a zip archive. The results obtained are useful as input to dimensioning and evaluation procedures for several application areas of fragment hashing.
In certain wireless environments the common transport protocol assumption that all packet losses are due to congestion may not hold true. If transport protocols react identically to both congestion and non-congestion related losses, performance will degrade. To avoid this performance problem, transport protocols must be able to differentiate between losses that are due to congestion and losses that are due to wireless link errors. Various loss differentiation techniques exist, some based on the sender side and some on the receiver side. If the loss differentiation is performed at the receiver, loss notification is needed to inform the sender of the loss cause. The sender use the loss notification information to adapt its retransmission and congestion avoidance behavior. The work in this paper examines the effectiveness of two different notification schemes that are to be used in conjunction with receiver based loss differentiation. The first scheme uses a TCP option to explicitly convey the a loss counter and the sequence number of the last corrupted packet. The second scheme uses the generation of additional dup-acks as a way to implicitly influence the retransmission and congestion behavior of the sender. The advantage of the second scheme is that it requires no changes at the sender side TCP implementation. However, since the second scheme can not change the basic loss behavior of halving the congestion window of TCP, its performance is reduced in relation to the first scheme. This paper provides a description of the design of these two loss notification schemes. Additionally, an experimental evaluation is presented, based on a FreeBSD kernel implementation of the two notification schemes in conjunction with checksum based loss differentiation
The evolution of computer communications and the Internet has led to the emergence of a large number of communication technologies with widely different capabilities and characteristics. While this multitude of technologies provides a wide array of possibilities it also creates a complex and heterogeneous environment for higher-layer communication protocols. Specific link technologies, as well as overall network heterogeneity, can hamper user-perceived performance or impede end-to-end throughput. In this thesis we examine two transport layer centered approaches to improve performance.The first approach addresses the decrease in user satisfaction that occurs when web waiting times become too long. Increased transport layer flexibility with regards to reliability, together with error-resilient image coding, is used to enable a new trade-off. The user is given the possibility to reduce waiting times, at the expense of image fidelity. An experimental examination of this new functionality is provided, with a focus on image-coding aspects. The results show that reduced waiting times can be achieved, and user studies indicate the usefulness of this new trade-off.The second approach concerns the throughput degradations that can occur as a consequence of link and transport layer interactions. An experimental evaluation of the GSM environment shows that when negative interactions do occur, they are coupled to large variability in link layer round-trip times rather than simply to poor radio conditions. Another type of interaction can occur for link layers which expose higher layers to residual bit errors. Residual bit-errors create an ambiguity problem for congestion controlled transport layer protocols which cannot correctly determine the cause for a loss. This ambiguity leads to an unnecessary throughput degradation. To mitigate this degradation, loss differentiation and notification mechanisms are proposed and experimentally evaluated from both performance and fairness perspectives. The results show that considerable performance improvements can be realized. However, there are also fairness implications that need to be taken into account since the same mechanisms that improve performance may also lead to unfairness towards flows that do not employ loss differentiation.
Humans are often faced with the need to make decisions regarding complex issues where multiple interests need to be balanced, and where there are a number of complex arguments weighing in opposite directions. The ability of humans to understand and internalize the underlying argumentation structure resulting from reasoning about complex issues is limited by the human cognitive ability. The cognitive limit can manifest itself both in relation to an inappropriate level and amount of detail in the presentation of information, as well as in the structuring of the information and the representation of the interrelationships between constituting arguments. The GATM model provides a structured way to represent reasoning, and can be useful both in the decision-making process as well as when communicating a decision. In this work a component-based overview of the GATM model is provided in the context of security policy reasoning, where previous work has shown that decision-making transparency and improved understanding of the reasoning behind a security policy may lead to a beneficial impact on policy compliance.
Network delays and user perceived latencies are of major importance in many applications in cellular networks. Delays can be measured with multiple approaches and at different protocol layers. This work involves a detailed examination of several delay metrics from a network, transport, and application perspective. The study explores base delay as well as latency under load, capturing also the effect of buffering. The examination is based on a comprehensive active measurement campaign performed in the networks of four Swedish operators. The results show that the delay captured by different metrics can vary significantly, with delay captured from the TCP three-way-handshake and adaptive ping measurements giving the most consistent results for base network delay in our measurements. As expected, when background traffic is introduced measured delay increases by an order of magnitude due to buffering in the network, highlighting the importance of also capturing latency under load when describing network performance. Finally, using an analytic model of flow completion time, we show that well-selected network measurements can provide a good prediction of higher layer delay performance.
Network emulation has for a long time been an important tool for evaluating the performance of communication protocols. By emulating network characteristics, such as restricted bandwidth, delay and losses, knowledge about the behavior and performance of actual protocol implementations can be obtained. This paper focuses on the generation of losses in network emulators and shows the beneficial effects of being able to control the generation of losses in a precise way. Both the possibility to get additional knowledge about a protocol implementations behavior, as well as statistical benefits such as paired experiments are discussed. By extending the loss generation to also include bit-error generation, in addition to packet losses, a finer level of abstraction is provided. Deterministic bit-error generation allows detailed and repeatable studies of bit-error sensitive protocol behavior. TCP and a loss differentiating variant of TCP is used to illustrate the utility of improved loss generation.
This paper examines the efficiency of resource utilization with respect to short-lived TCP flows in various cellular networks. The examination is done from the vantage point of an end-user who would like to use as much as possible of the cellular transmission resources that are available at any given time, thus minimizing the delays associated with communication. Based on a comprehensive measurement campaign we first derive network characteristics with regards to base RTT, RTT under load, and average throughput. A protocol efficiency metric is introduced to capture how efficiently short TCP flows are in fact able to use the instantaneously available transmission resources in a cellular network. The measurements show that short TCP connections have low efficiency in 3. SG (HSPA+) and 4G (LTE) mobile broadband networks, and that the improved latency and throughput characteristics of 4G in relation to 3. SG nevertheless results in lower short-flow efficiency for 4G.
Communication performed with mobile devices will experience varying levels of connectivity as the communication device moves in and out of coverage. A subset of mobile communication devices operate under conditions where the connectivity is characterized by relatively short contact periods occurring intermittently. In this paper we propose a model to predict the amount of data that can be transferred during such short contact periods. The model includes aspects of the transport layer slow-start behavior and is validated using data from a long-running measurement campaign in the networks of four Swedish cellular networks. Further validation of the modeling assumptions is performed by employing a numerical optimization technique based on non-linear least squares regression using the iterative Levenberg-Marquardt approach. The model is then used to explore the relevant parameter space.
Abstract:Access to reliable high-quality communication services on trains is important for today's mobile users. Train-mounted aggregation routers that provide WiFi access to train passengers and bundle external communication over multiple cellular modems/links is an efficient way of providing such services. Still, the characteristics of such systems have received limited attention in the literature. In this paper we examine the communication characteristics of such systems based on a large data set gathered over six months from an operational Swedish railway system. We characterize the conditions in terms of usage load, train velocity profiles, and observed throughput and delay as well as the relation between these parameters. Furthermore, we examine the data from an anomaly detection perspective. Based on a changepoint detection method, we examine how the collected metrics varies over the six months. Being able to detect shifts in the metrics over time can help detect anomalous changes in the hardware or environment, and also further helps explain the factors affecting the observed behaviors.
In this study we examine the conditions in a current cellular network by examining data passively collected in the core of a cellular operator during a 24-hour period. More than 2 billion traffic measurement data points from over 500,000 cellular users are analyzed. The analysis characterizes the Time-of-Day (ToD) variations for traffic intensity and session length and serves as a complement to the active measurements also performed. A comprehensive active measurement campaign was completed in the HSDPA+ and LTE networks of the four major Swedish operators. We collect around 50,000 data points from stationary cellular modems and analyze the ToD variation pattern for underlying network layer metrics such as delay and throughput. In conjunction with the time-varying session size distribution obtained from the passive measurements, we then analyze the ToD impact on TCP flows of varying sizes. The ToD effects are examined using time series analysis with Lomb-Scargle periodograms and differential Bayesian Information Criterion to allow comparison of the relative impact of the network ToD effects. The results show that ToD effects are predominantly impacting longer-running flows, and although short flows are also impacted they are mostly constrained by other issues such as protocol efficiency.
Train-mounted aggregation routers that provide WiFi access to train passengers and bundle external communication over multiple cellular modems/links is an efficient way of providing communication services on trains. However, the characteristics of such systems have received limited attention in the literature. In this paper we address this gap by examining the communication characteristics of such systems based on a large data set gathered over six months from an operational Swedish railway system. We focus our examination on the relationship between per link throughput and train velocity. Using Levenberg- Marquardt non-linear regression a noticeable critical point is observed for an RS-SINR of around 12 dB. At this point the impact of increased train velocity on per link throughput changes from being negative to becoming positive. Using a machine learning approach we also explore the relative importance of several observed metrics in relation to per link throughput.
We examine the connection reliability of LTE cellular infrastructure for supporting train signaling systems. In particular, the impact of simultaneous use of multiple networks on reliability is considered, along with failure correlation effects. We present a tailored reliability model, and report on data collected from many train-mounted cellular routers. Connection reliability reaches 99.994% when aggregation is used, compared to 99.953% for the best single link. Both modeling and measurement results show greatly improved reliability when aggregating over multiple links, thus indicating that commercial cellular networks may be useful for providing connectivity to future train signaling systems.
Residual bit-errors in wireless environments are well known to cause difficulties for congestion controlled protocols like TCP. In this study we focus on a receiver-based loss differentiation approach to mitigating the problems, and more specifically on two different loss notification schemes. The fully receiver-based 3-dupack scheme uses additional dupacks to implicitly influence the retransmission behavior of the sender. The second TCP option scheme uses a TCP option to explicitly convey a corruption notification. Although these schemes look relatively simple at first glance, when examining the details several issues exist which are highlighted and discussed. A performance evaluation based on a FreeBSD kernel implementation show that the TCP option scheme works well in all tested cases and provides a considerable throughput improvement. The 3-dupack scheme also provide performance gains in most cases, but the improvements varies more between different test cases, with some cases showing no improvement over regular TCP.