Confidence-linked and uncertainty-based staged framework for phenotype validation using large language models

  • Sumin Lee
  • , Hyeok Hee Lee
  • , Hokyou Lee
  • , Kyu Sun Yum
  • , Jang Hyun Baek
  • , Jaewon Khil
  • , Jaeyong Lee
  • , Sojung Shin
  • , Minsung Cho
  • , Na Yeon Ahn
  • , Seng Chan You
  • , Hyeon Chang Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: This study develops and validates the confidence-linked and uncertainty-based staged (CLUES) framework by integrating large language models (LLMs) with uncertainty quantification to assist manual chart review while ensuring reliability through a selective human review. Materials and Methods: The CLUES framework assesses stroke-related hospitalizations using imaging reports for 1739 patients across 24 Korean hospitals (2011–2022). Uncertainty was quantified via entropy from LLM-derived confidence values. Our framework operated in 3 stages: (1) zero-shot prompting with ensemble averaging, where high-uncertainty cases advanced to stage 2, (2) few-shot prompting using retrieved low-uncertainty cases, with remaining high-uncertainty cases proceeding to stage 3, and (3) manual chart review for final uncertain cases. Performance was evaluated against physician-labeled data using F1-score and Cohen’s Kappa. Results: Among 1072 test cases, stage 1 classified 507 cases as low uncertainty, while 565 were high uncertainty. Stage 2 reclassified 280 cases as low uncertainty, leaving 285 for manual review. Low-uncertainty cases consistently outperformed high-uncertainty cases in both stages (weighted F1-scores: 0.94 vs 0.57 in stage 1 and 0.82 vs 0.58 in stage 2). The overall framework performance showed a progressive improvement in F1-scores from 0.840 (stage 1) to 0.878 (stage 2) to 0.955 (stage 3). Discussion: The CLUES framework reduced manual review burden by 75% while maintaining high accuracy. By integrating uncertainty quantification with selective human oversight, it provides an efficient and reliable approach to phenotype validation. Conclusion: This framework demonstrates the effective integration of LLMs into clinical workflows while ensuring human oversight, enhancing both accuracy and efficiency.

Original languageEnglish
Pages (from-to)1320-1327
Number of pages8
JournalJournal of the American Medical Informatics Association
Volume32
Issue number8
DOIs
StatePublished - 1 Aug 2025
Externally publishedYes

Keywords

  • entropy
  • large language models
  • phenotype
  • review
  • uncertainty

Fingerprint

Dive into the research topics of 'Confidence-linked and uncertainty-based staged framework for phenotype validation using large language models'. Together they form a unique fingerprint.

Cite this