There’s been a lot of speculation and panic around what the AI tool ChatGPT can and can’t do and whether it’s going to replace/destroy us all. But it looks like it’s not going to be replacing doctors any time soon, even though it may be a semi-reliable source for those studying for the United States Medical Licensing Exam (USMLE).
Did ChatGPT pass a medical licensing exam?
In a word, no. ChatGPT did not “pass” a medical licensing exam, though there will probably be some sensational headlines to the contrary.
A study published February 9, 2023 in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and others at AnsibleHealth found that:
“ChatGPT can score at or around the approximately 60 percent passing threshold for the United States Medical Licensing Exam (USMLE), with responses that make coherent, internal sense and contain frequent insights…”
In other words, ChatGPT can generate human-like responses to certain types of questions on the exam.
According to a press release about the study:
“Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (Steps 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and physicians-in-training, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning, to bioethics.”
However, they had to remove all image-based questions, so the AI could only attempt 350 of the 376 questions (that are available for public viewing) that appeared on the June 2022 exam.
How well did AI do on the medical exam?
There’s a joke people tell that goes a little something like this:
What do you call the person who graduates first in their medical class?
What do you call the person who graduates last in their medical class?
In other words, you’re never going to know if your doctor aced their exam or barely passed.
But if ChatGPT is your doctor, you will know it’s incapable of doing really well on the exam (though in some cases, it did pass). After the researchers removed all “indeterminate responses,” “ChatGPT scored between 52.4% and 75.0% across the three USMLE exams. The passing threshold each year is approximately 60%.”
It was good at producing novel and clinically significant insights the vast majority of the time, however. But if you can’t pass, that doesn’t really mean much.
What’s also interesting is that “ChatGPT exceeded the performance of PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.”
But all that means is that ChatGPT is a really good AI language model. It doesn’t mean it’ll be replacing your doctor any time soon. And since it’s not always good at producing correct information, you may not want to use it to get medical advice.
What ChatGPT can be used for is helping to distill jargon-heavy medical writing into language that non-experts can understand. So if there’s a scientific study you’d like to know more about, you can plug that into ChatGPT and get a fairly reliable summary. — WTF fun facts