Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
QSpark: Distributed Execution of Batch & Streaming Analytics in Spark Platform
The University of Sydney, AUS.
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0000-0001-9194-010X
The University of Sydney, AUS.
RMIT University, AUS.
2021 (English)In: 2021 IEEE 20TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA) / [ed] Andreolini, M Marchetti, M Avresky, DR, IEEE, 2021, article id 176804Conference paper, Published paper (Refereed)
Abstract [en]

A significant portion of research work in the past decade has been devoted on developing resource allocation and task scheduling solutions for large-scale data processing platforms. Such algorithms are designed to facilitate deployment of data analytic applications across either conventional cluster computing systems or modern virtualized data-centers. The main reason for such a huge research effort stems from the fact that even a slight improvement in the performance of such platforms can bring a considerable monetary savings for vendors, especially for modern data processing engines that are designed solely to perform high throughput or/and low-latency computations over massive-scale batch or streaming data. A challenging question to be yet answered in such a context is to design an effective resource allocation solution that can prevent low resource utilization while meeting the enforced performance level (such as 99-th latency percentile) in circumstances where contention among applications to obtain the capacity of shared resources is a non negligible performance-limiting parameter. This paper proposes a resource controller system, called QSpark, to cope with the problem of (i) low performance (i.e., resource utilization in the batch mode and p-99 response time in the streaming mode), and (ii) the shared resource interference among collocated applications in a multi-tenancy modern Spark platform. The proposed solution leverages a set of controlling mechanisms for dynamic partitioning of the allocation of computing resources, in a way that it can fulfill the QoS requirements of latency-critical data processing applications, while enhancing the throughput for all working nodes without reaching their saturation points. Through extensive experiments in our in-house Spark cluster, we compared the achieved performance of proposed solution against the default Spark resource allocation policy for a variety of Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning (DL) applications. Experimental results show the effectiveness of the proposed solution by reducing the p-99 latency of high priority applications by 32% during the burst traffic periods (for both batch and stream modes), while it can enhance the QoS satisfaction level by 65% for applications with the highest priority (compared with the results of default Spark resource allocation strategy).

Place, publisher, year, edition, pages
IEEE, 2021. article id 176804
Series
IEEE International Symposium on Network Computing and Applications, ISSN 2643-7910
Keywords [en]
Streaming Big Data Processing, Quality of Service (QoS), Iterative Computation, Dynamic resource allocation
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kau:diva-89810DOI: 10.1109/NCA53618.2021.9685833ISI: 000783516600018Scopus ID: 2-s2.0-85125890353ISBN: 978-1-6654-9550-9 (print)OAI: oai:DiVA.org:kau-89810DiVA, id: diva2:1658913
Conference
IEEE 20th International Symposium on Network Computing and Applications (NCA), NOV 23-26, 2021, ELECTR NETWORK
Available from: 2022-05-18 Created: 2022-05-18 Last updated: 2022-10-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Taheri, Javid

Search in DiVA

By author/editor
Taheri, Javid
By organisation
Department of Mathematics and Computer Science (from 2013)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 110 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf