Doctors had the clear edge in Digital Health Live AI bake-off
- 13 March 2024
A clinician-dominated audience was pitted against the best of large language model intelligence Tuesday afternoon at Digital Health Rewired24, as day one concluded on the AI, Data and Analytics stage with a Live AI bake-off between ChatGPT and human doctors.
Haris Shuaib, CEO of Newton’s Tree and head of CSC at Guy’s and St Thomas’ NHS Foundation Trust, chaired the session, with a panel including Lia Ali, clinical advisor at NHS England’s Transformation Directorate, Keith Grimes, a digital health and innovation consultant, Annabelle Painter, an NHS doctor and clinical AI fellow at the Artificial Intelligence Centre for Value-Based Healthcare and Michael Watts, associate CCIO at University Hospitals of Derby and Burton NHS Foundation Trust.
The panellists presented several scenarios to the audience. In one, a patient complained of a sore back, burning pain in their foot and recent “tingling in my privates.” while clinicians in the audience and onstage suggested that they would need to ask more questions, most suspected a condition that involves nerve compression.
When fed the same scenario, the AI model prefaced its response with an acknowledgment that it was not a doctor – a response that Shuaib described as “an improvement” from previous iterations of the technology – before also suggesting possible nerve damage and saying the condition might require immediate medical attention.
Audience members were asked to vote by a show of hands whether they would prefer doctors or AI across different domains of health care quality – safety, effectiveness, timeliness, patient-centredness, efficiency and equitable impact; the clinicians won comprehensively in all categories with the exception of timeliness, where they were badly beaten to little surprise, and patient-centredness and empathy, where the vote was closer.
AI answers limited in surprising and predictable ways
“For most of the scenarios, it was very difficult for us to get ChatGPT to give a bad answer,” Shuaib said. Two areas stood out, he noted, one of which was the calculation of a dose of medicine because it involved mathematical reasoning.
“In the textual explanation, it explains its working out like a child doing a maths problem, but when it does the actual calculation it does it wrong, which is interesting,” he said. “So it has this explanation and veneer of respectability and correctness, but if you don’t look closely, it’s actually got the maths wrong. And it took us ten minutes to work out why it was wrong, so it made extra work for us to double check its answer.”
In addition to the potential surprise that a computer suffers from deficient quantitative skills, Shuaib said, the example demonstrated that when paired with an expert user, AI “makes the expert user slower, and it makes the non-expert user over-confident, so it’s a double-edged sword.”
A more complicated example trialled for the panel was a patient suffering from domestic violence, whose sister reported the patient appearing with bruises that she blamed on being clumsy, and suffering from depression.
“We had to really manufacture that story, that narrative, because when we tried to be more subtle [the AI model] just wouldn’t get it,” Shuaib said. “It would suggest things like vitamin deficiencies and actual clumsiness, but it wouldn’t ask about any family issues or anything like that until we made it so bleeding obvious.”
Once the model figured out what the issue was, he said, it gave a “delicate” response, advising that the clinician keep an eye on things and have a conversation without judgment and recognise that the patient might take time to open up.
1 Comments
This was the best session at REWIRED for me, and surfaced all sorts of issues that had been at the edge of my concerns/excitement.
I didn’t want to interrupt the flow on the day but wanted to bring in something we skirted around: the medicolegal consequences of the decisions. I had the privilege of having one of my GP Trainers being the Vice-Chancellor of the MDU and he brought some interesting insights that have stuck with me.
Cauda Equina Syndrome – (unless times have changed) – one third of our medical indemnity premiums are solely to cover the multimillion £ payouts in case we miss a case in our careers. One third! The issue is that young people get slipped discs and end up wheelchair bound with no sexual function and double incontinence, with no organ damage = decades of loss of quality of life and social function. The sums work out that if we scanned 1000 acute back pain patients and picked up one CES that we would otherwise miss, a Trust would have equalled their compensation bill. The MDU’s observation was that in the majority of cases, it was false self-belief in clinicians that their clinical judgement was important in deciding who to scan and who not to (hence the culpability).
So any AI model that is allowed to consider these things should read the case studies from the indemnity organisations, and court judgements etc to inform the ‘most likely’ correct answer!
Comments are closed.