Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2604.19974v1 Announce Type: new Abstract: Large language models can be uncertain yet correct, or confident yet wrong, raising the question of whether their output-level uncertainty and their actual correctness are driven by the same internal mechanisms or by distinct feature populations. We in

Discussion

No replies yet. Be first.

Are LLM Uncertainty and Correctness Encoded by the Same Features? A Functional Dissociation via Sparse Autoencoders

Related coverage