Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads
2024 (English) In: / [ed] Modesto Castrillon-Santana; Maria De Marsico; Ana Fred, SciTePress, 2024, Vol. 1, p. 558-569Conference paper, Published paper (Refereed)
Abstract [en]
Accurate predictive models for cloud workloads can be helpful in improving task scheduling, capacity planning and preemptive resource conflict resolution, especially in the setting of co-located jobs. Alibaba, one of the leading cloud providers co-locates transient batch tasks and high priority latency sensitive online jobs on the same cluster. In this paper, we consider the problem of using a publicly released dataset by Alibaba to model the batch tasks that are often overlooked compared to online services. The dataset contains the arrivals and resource requirements (CPU, memory, etc.) for both batch and online tasks. Our trained model predicts, with high accuracy, the number of batch tasks that arrive in any 30 minute window, their associated CPU and memory requirements, and their lifetimes. It captures over 94% of arrivals in each 30 minute window within a 95% prediction interval. The F1 scores for the most frequent CPU classes exceed 75%, and our memory and lifetime predictions incur less than 1% test data loss. The prediction accuracy of the lifetime of a batch-task drops when the model uses both CPU and memory information, as opposed to only using memory information.
Place, publisher, year, edition, pages SciTePress, 2024. Vol. 1, p. 558-569
Keywords [en]
Cloud Workload Modeling, Co-Located Workloads, Time Series Forecasting, Recurrent Neural Networks.
National Category
Computer Systems
Research subject Computer Science
Identifiers URN: urn:nbn:se:kau:diva-99726 DOI: 10.5220/0012392700003654 Scopus ID: 2-s2.0-85190672119 ISBN: 978-989-758-684-2 (electronic) OAI: oai:DiVA.org:kau-99726 DiVA, id: diva2:1859591
Conference 13th International Conference on Pattern Recognition Applications and Methods, ICPRAM, Rome, Italy, February 24-26, 2024.
2024-05-222024-05-222024-05-22 Bibliographically approved