Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Quality Assurance for LLM-RAG Systems: Empirical Insights from Tourism Application Testing
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0000-0001-9051-7609
Ludwig Maximilians University Munich.
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Mathematics and Computer Science (from 2013).ORCID iD: 0000-0003-0683-2783
Karlstad University, Faculty of Arts and Social Sciences (starting 2013), Service Research Center (from 2013). Karlstad University, Faculty of Arts and Social Sciences (starting 2013), Karlstad Business School (from 2013).ORCID iD: 0000-0002-3281-7942
Show others and affiliations
2025 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

This paper presents a comprehensive framework for testing and evaluating quality characteristics of Large Language Model (LLM) systems enhanced with Retrieval-Augmented Generation (RAG) in tourism applications. Through systematic empirical evaluation of three different LLM variants across multiple parameter configurations, we demonstrate the effectiveness of our testing methodology in assessing both functional correctness and extra-functional properties. Our framework implements 17 distinct metrics that encompass syntactic analysis, semantic evaluation, and behavioral evaluation through LLM judges. The study reveals significant information about how different architectural choices and parameter configurations affect system performance, particularly highlighting the impact of temperature and top-p parameters on response quality. The tests were carried out on a tourism recommendation system for the Värmland region, utilizing standard and RAG-enhanced configurations. The results indicate that the newer LLM versions show modest improvements in performance metrics, though the differences are more pronounced in response length and complexity rather than in semantic quality. The research contributes practical insights for implementing robust testing practices in LLM-RAG systems, providing valuable guidance to organizations deploying these architectures in production environments.

Place, publisher, year, edition, pages
2025.
National Category
Computer Systems
Research subject
Computer Science; Business Administration
Identifiers
URN: urn:nbn:se:kau:diva-104250OAI: oai:DiVA.org:kau-104250DiVA, id: diva2:1956949
Conference
ITEQS conference 2025, The 9th International Workshop on Testing Extra-Functional Properties and Quality Characteristics of Software Systems
Available from: 2025-05-07 Created: 2025-05-07 Last updated: 2025-10-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

https://conf.researchr.org/home/icst-2025/iteqs-2025#event-overview

Authority records

Ahmed, Bestoun S.Bayram, FirasJagstedt, SiriMagnusson, Peter

Search in DiVA

By author/editor
Ahmed, Bestoun S.Bayram, FirasJagstedt, SiriMagnusson, Peter
By organisation
Department of Mathematics and Computer Science (from 2013)Service Research Center (from 2013)Karlstad Business School (from 2013)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 102 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf