Studies show that saliency heat maps may not be ready for prime time yet
Artificial intelligence models that interpret medical images promise to improve clinicians' ability to make accurate and timely diagnoses while reducing workload by allowing busy doctors to focus on critical cases and delegate routine tasks to AI. But AI models that lack transparency into how and why a diagnosis is made can be problematic. This opaque argument -; also known as “Black Box” AI -; can reduce the doctor's confidence in the reliability of the AI tool and thus discourage its use. This lack of transparency could also lead clinicians to overlook interpretation of the...

Studies show that saliency heat maps may not be ready for prime time yet
Artificial intelligence models that interpret medical images promise to improve clinicians' ability to make accurate and timely diagnoses while reducing workload by allowing busy doctors to focus on critical cases and delegate routine tasks to AI.
But AI models that lack transparency into how and why a diagnosis is made can be problematic. This opaque argument -; also known as “Black Box” AI -; can reduce the doctor's confidence in the reliability of the AI tool and thus discourage its use. This lack of transparency could also lead clinicians to trust the tool's interpretation.
In the field of medical imaging, saliency assessments have been a way to create more understandable AI models and demystify AI decision making -; An approach that uses heatmaps to determine whether the tool is correctly focusing only on the relevant parts of a given image or targeting irrelevant parts of it.
Heatmaps work by highlighting areas on an image that influenced the AI model's interpretation. This could help human doctors detect whether the AI model is focusing on the same areas as them or incorrectly focusing on irrelevant places in an image.
But a new study published Oct. 10 in Nature Machine Intelligence shows that, for all their promise, saliency heatmaps aren't ready for prime time yet.
The analysis, led by Harvard Medical School investigator Pranav Rajpurkar, Stanford's Matthew Lungren and New York University's Adriel Saporta, quantified the validity of seven widely used highlighting methods to determine how reliably and accurately they can identify pathologies associated with 10 commonly diagnosed conditions X-ray image, such as lung lesions, pleural effusions, edema or enlarged cardiac structures. To determine performance, the researchers compared the tools' performance with human expert judgment.
Ultimately, tools that used salient heatmap-based heatmaps consistently underperformed compared to human radiologists in image assessment and their ability to detect pathologic lesions.
The work represents the first comparative analysis between saliency maps and human expert performance in assessing multiple radiographic pathologies. The study also provides a detailed understanding of whether and how certain pathological features in an image can impact the performance of AI tools.
The saliency map feature is already being used as a quality assurance tool by clinical practices that use AI to interpret computer-aided detection methods, such as: B. Reading chest x-rays. But in view of the new findings, this feature should be enjoyed with caution and a healthy dose of skepticism, according to the researchers.
Our analysis shows that saliency maps are not yet reliable enough to validate individual clinical decisions made by an AI model. We have identified important limitations that raise serious safety concerns for use in current practice.”
Pranav Rajpurkar, Assistant Professor of Biomedical Informatics, HMS
The researchers caution that because of the important limitations identified in the study, salience-based heatmaps should be further refined before being widely used in clinical AI models.
The team's full codebase, data, and analysis are open and available to anyone interested in exploring this important aspect of clinical machine learning in medical imaging applications.
Source:
Reference:
Saporta, A., et al. (2022) Benchmarking saliency methods for chest radiograph interpretation. Nature-machine intelligence. doi.org/10.1038/s42256-022-00536-x.
.