Mechanisms of Introspective Awareness

Source

arxiv.orgfull article ↗

Read on arxiv

Publisher summary· verbatim

arXiv:2603.21396v5 Announce Type: replace Abstract: Recent work has shown that LLMs can sometimes detect when steering vectors are injected into their residual stream and identify the injected concept -- a phenomenon termed "introspective awareness." We investigate the mechanisms underlying this cap

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Discussion

No replies yet. Be first.

Mechanisms of Introspective Awareness

Related coverage

Mechanisms of Introspective Awareness

Related coverage