Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Integrating Multimodal Communication and Comprehension Evaluation during Human-Robot Collaboration for Increased Reliability of Foundation Model-based Task Planning Systems
Karlstad University, Faculty of Health, Science and Technology (starting 2013), Department of Engineering and Physics (from 2013).ORCID iD: 0000-0002-6865-7346
Show others and affiliations
2025 (English)Conference paper (Refereed)
Abstract [en]

Foundation models provide the adaptability needed in robotics but often require explicit tasks or human verification due to potential unreliability in their responses, complicating human-robot collaboration (HRC). To enhance the reliability of such task-planning systems, we propose 1) an adaptive task-planning system for HRC that reliably performs non-predefined tasks implicitly instructed through HRC, and 2) an integrated system combining multimodal large language model (LLM)-based task planning with multimodal communication of human intention to increase the HRC success rate and comfort. The proposed system integrates GPT-4V for adaptive task planning and comprehension evaluation during HRC with multimodal communication of human intention through speech and deictic gestures. Four pick-and-place tasks of gradually increasing difficulty were used in three experiments, each evaluating a key aspect of the proposed system: task planning, comprehension evaluation, and multimodal communication. The quantitative results show that the proposed system can interpret implicitly instructed tabletop pick-and-place tasks through HRC, providing the next object to pick and the correct position to place it, achieving a mean success rate of 0.80. Additionally, the system can evaluate its comprehension of three of the four tasks with an average precision of 0.87. The qualitative results show that multimodal communication not only significantly enhances the success rate but also the feelings of trust and control, willingness to use again, and sense of collaboration during HRC. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025. p. 1053-1059
Keywords [en]
Digital elevation model, Human robot interaction, Robot programming, Foundation models, Human intentions, Human-robot collaboration, Integrated systems, Model-based OPC, Multi-modal, Multimodal communications, Pick and place, Planning systems, Task planning, Man machine systems
National Category
Robotics and automation
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kau:diva-104074DOI: 10.1109/SII59315.2025.10871045Scopus ID: 2-s2.0-86000252063OAI: oai:DiVA.org:kau-104074DiVA, id: diva2:1954676
Available from: 2025-04-25 Created: 2025-04-25 Last updated: 2025-04-25

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Solis, Jorge

Search in DiVA

By author/editor
Solis, Jorge
By organisation
Department of Engineering and Physics (from 2013)
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 2 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • apa.csl
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf