Beyond Single Models: Unsupervised Ensemble Selection for Small Language Models in Medical QA
PDF

Keywords

Artificial Intelligence
Large Language Models
Small Language Models
Medical Question Answering
Clinical NLP

Abstract

Small Language Models (SLMs) provide efficient alternatives to large models for clinical open-ended question answering (QA) but often show variable performance. We propose two unsupervised answer selection strategies for SLM ensembles: a confidence-based method using normalized perplexity and a consensus-based medoid method capturing semantic similarity among model outputs. Evaluations on three clinical QA benchmarks show that both strategies outperform single-model and random selection baselines. The results show that unsupervised confidence and consensus mechanisms can enhance the performance of SLM ensembles for medical QA without requiring additional training or increasing model size.

https://doi.org/10.60643/urai.v2025p10
PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2025 Nicolas Ventulett, Fabian Nicklas, Eric Gaida, Dieter Wallach, Jan Conrad