VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning
View PDF HTML (experimental) Abstract:Uncertainty quantification (UQ) is essential for deploying deep learning models in safety critical applications, yet no consensus exists on which UQ method performs best across different data modalities and distribution shifts. This paper presents a comprehensive benchmark of ten widely used UQ baselines including MC Dropout, SWAG, ensemble methods, temperature scaling, energy based OOD, Mahalanobis, hyperbolic classifiers, ENN, Taylor Sensus, and split conformal prediction against a simplified yet highly effective variant of VOLTA that retains only a deep encoder, learnable prototypes, cross entropy loss, and post hoc temperature scaling. We evaluate all methods on CIFAR 10 (in distribution), CIFAR 100, SVHN, uniform noise (out of distribution), CIFAR 10 C (corruptions), and Tiny ImageNet features (tabular). VOLTA achieves competitive or superior accuracy (up to 0.864 on CIFAR 10), significantly lower expected calibration error (0.010 vs. 0.044 to 0.102 for baselines), and strong OOD detection (AUROC 0.802). Statistical testing over three random seeds shows that VOLTA matches or outperforms most baselines, with ablation studies confirming the importance of adaptive temperature and deep encoders. Our results establish VOLTA as a lightweight, deterministic, and well calibrated alternative to more complex UQ approaches. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV) Cite as: arXiv:2604.08639 [cs.LG] (or arXiv:2604.08639v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2604.08639 arXiv-issued DOI via DataCite Submission history From: Rahul D Ray Ray [view email] [v1] Thu, 9 Apr 2026 17:22:36 UTC (1,288 KB)
No replies yet. Be first.