Fool Me Once? Contrasting Textual and Visual Explanations in a Clinical Decision-Support Setting

Published in Empirical Methods in Natural Language Processing (EMNLP), 2024

Abstract

The growing capabilities of AI models are leading to their wider use, including in safety-critical domains. Explainable AI (XAI) aims to make these models safer to use by making their inference process more transparent. However, current explainability methods are seldom evaluated in the way they are intended to be used: by real-world end users. To address this, we conducted a large-scale user study with 85 healthcare practitioners in the context of human-AI collaborative chest X-ray analysis. We evaluated three types of explanations: visual explanations (saliency maps), natural language explanations, and a combination of both modalities. We specifically examined how different explanation types influence users depending on whether the AI advice and explanations are factually correct. We find that text-based explanations lead to significant over-reliance, which is alleviated by combining them with saliency maps. We also observe that the quality of explanations, that is, how much factually correct information they entail, and how much this aligns with AI correctness, significantly impacts the usefulness of the different explanation types.

Link to paper TBD.

BibTex

@inproceedings{kayser_fool_2024,
title = {Fool {Me} {Once}? {Contrasting} {Textual} and {Visual} {Explanations} in a {Clinical} {Decision}-{Support} {Setting}},
shorttitle = {Fool {Me} {Once}?},
url = {https://openreview.net/forum?id=5oAV878LxG&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DEMNLP%2F2024%2FConference%2FAuthors%23your-submissions)},
abstract = {The growing capabilities of AI models are leading to their wider use, including in safety-critical domains. Explainable AI (XAI) aims to make these models safer to use by making their inference process more transparent. However, current explainability methods are seldom evaluated in the way they are intended to be used: by real-world end users. To address this, we conducted a large-scale user study with 85 healthcare practitioners in the context of human-AI collaborative chest X-ray analysis. We evaluated three types of explanations: visual explanations (saliency maps), natural language explanations, and a combination of both modalities. We specifically examined how different explanation types influence users depending on whether the AI advice and explanations are factually correct. We find that text-based explanations lead to significant over-reliance, which is alleviated by combining them with saliency maps. We also observe that the quality of explanations, that is, how much factually correct information they entail, and how much this aligns with AI correctness, significantly impacts the usefulness of the different explanation types.},
language = {en},
urldate = {2024-10-04},
author = {Kayser, Maxime Guillaume and Menzat, Bayar and Emde, Cornelius and Bercean, Bogdan Alexandru and Novak, Alex and Morgado, Abdalá Trinidad Espinosa and Papiez, Bartlomiej and Gaube, Susanne and Lukasiewicz, Thomas and Camburu, Oana-Maria},
month = sep,
year = {2024},
}

Recommended citation:
Download Paper