Clinical AI tool scores evaluate the highest-ever medical licensing exam in the United States

Transparenz: Redaktionell erstellt und geprüft.
Veröffentlicht am

A powerful clinical artificial intelligence tool developed by University of Buffalo biomedical informatics researchers Jama Network Open. Achieving higher scores on the USMLE than most physicians and all other AI tools so far, Semantic Clinical Artificial Intelligence (SCAI, pronounced “Sky”) has the potential to become a critical partner for physicians, says lead author Peter L. Elkin, MD, chair of the Department of Biomedical Informatics in the Jacobs School of Medicine and Biomedical Sciences at UB and a physician with UBMD Internal Medicine. According to Elkin, SCAI is the most accurate clinical AI tool to date, in step...

Clinical AI tool scores evaluate the highest-ever medical licensing exam in the United States

A powerful clinical artificial intelligence tool developed by University of Buffalo biomedical informatics researchersJama Network Open.

Achieving higher scores on the USMLE than most physicians and all other AI tools so far, Semantic Clinical Artificial Intelligence (SCAI, pronounced “Sky”) has the potential to become a critical partner for physicians, says lead author Peter L. Elkin, MD, chair of the Department of Biomedical Informatics in the Jacobs School of Medicine and Biomedical Sciences at UB and a physician with UBMD Internal Medicine.

According to Elkin, SCAI is the most accurate clinical AI tool to date, scoring the most advanced version on Step 3 of the USMLE, while a GPT4 Omni tool scored 90.5% on the same test.

As doctors, we are used to using computers as tools, but SCAI is different. It can increase your decision making and thinking based on its own reasoning. “

Peter L. Elkin, MD, chair of the Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences at UB

The tool can answer medical questions asked by clinicians or the public

Researchers tested the model against the USMLE, which is required for licensure of physicians nationwide, which assesses the physician's ability to apply knowledge, concepts and principles and demonstrate basic patient-centered skills. Questions with a visual component have been eliminated.

Elkin explains that most AI tools work with statistics to find associations in online data that they can use to answer a question. “We call these tools generative artificial intelligence,” he says. “Some have postulated that they are just plagiarizing what is on the Internet because the answers they give you are what others have written.” However, these AI models are now becoming partners in care rather than simple tools that clinicians can use in their practice, he says.

“But Scai answers more complex questions and performs more complex semantic reasoning,” he says, “We have created sources of knowledge that can be more like how people learn during their training in medical school.”

The team started with previously developed natural language processing software. They added large amounts of authoritative clinical information drawn from widely disparate sources, ranging from recent medical literature and clinical guidelines to genomic data, drug information, discharge recommendations, patient safety data, and more. Any data that could be biased, such as: B. clinical notes were not included.

13 million medical facts

SCAI contains 13 million medical facts as well as all possible interactions between those facts. The team used basic clinical facts known as semantic triples (subject-object relationship such as “penicillin treats pneumococcal pneumonia”) semantic networks. The tool can then represent these semantic networks in such a way that it is possible to draw logical conclusions from them.

“We taught large-language models how to use semantic reasoning,” says Elkin.

Other techniques that contributed to SCAI include knowledge graphs, designed for new links in medical data as well as previously “hidden” patterns, as well as achieving retrieval generation, which allows the large language model to access and incorporate information from external knowledge bases at a prompt. This reduces “confabulation,” the tendency for AI tools to always respond to a prompt even if it doesn't provide enough information to proceed.

Elkin adds that using formal semantics to inform the large language model provides important context necessary for SCAI to more accurately understand and respond to a specific question.

“It can have a conversation with you.”

“SCAI is different from other large language models because it can converse with you and, as a human-computer partnership, can increase your decision-making and thinking based on its own reasoning,” says Elkin.

He concludes: “By adding semantics to large language models, we provide them with the ability to reason in a manner similar to practicing evidence-based medicine.”

Because it can access such massive amounts of data, SCAI can also improve patient safety, improve access to care, and “democratize specialty care,” Elkin says, making specialty and subspecialty medical information accessible to primary care and even patients.

While Scai's power is impressive, Elkin emphasizes that its mission is to augment, not replace, physicians.

“Artificial intelligence will not replace doctors,” he says, “a doctor who uses AI can replace a doctor who doesn’t.”

In addition to Elkin, UB co-authors from the Department of Biomedical Informatics include Guresh Mehta; Frank Lehouillier; Melissa Resnick, PhD; Crystal Tomlin, PhD; Skyler Resendez, PhD; and Jiaxing Liu.

Sarah Mullin, PhD, of Roswell Park Comprehensive Cancer Center, and Jonathan R. Nebeker, MD, and Steven H. Brown, MD, both of the Department of Veterans Affairs, are also co-authors.

The work was funded by grants from the National Institutes of Health and the Department of Veterans Affairs.


Sources:

Journal reference:

Elkin, P.L.,et al. (2025). Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE. JAMA Network Open. doi.org/10.1001/jamanetworkopen.2025.6359.