Clinically-Ready Label-Flip Detection for Medical AI

Daniel Schönle; Christoph Reich

doi:10.60643/urai.v2025p13

, Articles

Clinically-Ready Label-Flip Detection for Medical AI

Articles

https://doi.org/10.60643/urai.v2025p13

Published 27.03.2026

Daniel Schönle⁺⁻
Christoph Reich⁺⁻

Daniel Schönle

Furtwangen University

Christoph Reich

Furtwangen University

PDF

Keywords

label noise
confident learning
calibration
medical AI
fairness
governance
anomaly detection

Abstract

Medical AI pipelines face integrity risks from label flipping—mislabeling that harms thresholds, calibration, and parity. Because anomalies are rare, evolving, and often mislabeled, a purely supervised detector tends to miss new problems and flood reviewers with false alarms; a triage loop—rank strong model-vs-label disagreements, review a small top slice, fix, retrain—keeps effort low and results trustworthy. We present a lightweight procedure: basic plausibility/duplicate checks; leakage-safe K-fold cross-fitting; calibration; and Confident Learning to derive per-example flip scores (and the confident joint). High-scoring cases receive budgeted chart-review; we then selectively relabel or reweight, retrain, and recalibrate. We evaluate flip-ranking (PR-AUC, precision@k, TPR@low-FPR) and downstream AUROC/PR-AUC, ECE/Brier, and parity deltas. A HiRID ICU case demonstrates integrity and calibration gains with limited review effort.

https://doi.org/10.60643/urai.v2025p13

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.