Q-Flink: A QoS-Aware Controller for Apache FlinkShow others and affiliations
2020 (English)In: Proceedings - 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020, Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 629-638Conference paper, Published paper (Refereed)
Abstract [en]
Modern stream-data processing platforms are required to execute processing pipelines over high-volume, yet high-velocity, datasets under tight latency constraints. Apache Flink has emerged as an important new technology of large-scale platform that can distribute processing over a large number of computing nodes in a cluster (i.e., scale-out processing). Flink allows application developers to design and execute queries over continuous raw-inputs to analyze a large amount of streaming data in a parallel and distributed fashion. To increase the throughput of computing resources in stream processing platforms, a service provider might be tempted to use a consolidation strategy to pack as many processing applications as possible on the working nodes, with the hope of increasing the total revenue by improving the overall resource utilization. However, there is a hidden trap for achieving such a higher throughput solely by relying on an interference-oblivious consolidation strategy. In practice, collocated applications in a shared platform can fiercely compete with each others for obtaining the capacity of shared resources (e.g., cache and memory bandwidth) which in turn can lead to a severe performance degradation for all consolidated workloads.This paper addresses the shared resource contention problem associated with the auto-resource controlling mechanism of Apache Flink engine running across a distributed cluster. A controlling strategy is proposed to handle scenarios in which stream processing applications may have different quality of service (QoS) requirements while the resource interference is considered as the key performance-limiting parameter. The performance evaluation is carried out by comparing the proposed controller with the default Flink resource allocation strategy in a testbed cluster with total 32 Intel Xeon cores under different workload traffic with up to 4000 streaming applications chosen from various benchmarking tools. Experimental results demonstrate that the proposed controller can successfully decrease the average latency of high priority applications by 223% during the burst traffic while maintaining the requested QoS enforcement levels.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020. p. 629-638
Keywords [en]
Apache Flink, Computer System Modeling and Profiling, Massive Data Stream Processing, Meta-Scheduling, Model Predictive Controller, Resource Allocation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kau:diva-82957DOI: 10.1109/CCGrid49817.2020.00-30ISI: 000649540400063Scopus ID: 2-s2.0-85089065207ISBN: 9781728160955 (print)OAI: oai:DiVA.org:kau-82957DiVA, id: diva2:1529691
Conference
20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGRID 2020, 11 May 2020 through 14 May 2020
2021-02-192021-02-192021-06-14Bibliographically approved