Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs

Source

arxiv.orgfull article ↗

Publisher summary· verbatim

arXiv:2510.11288v4 Announce Type: replace Abstract: Recent work has shown that narrow finetuning can produce broadly misaligned LLMs, a phenomenon termed emergent misalignment (EM). While concerning, these findings were limited to finetuning and activation steering, leaving out in-context learning (

Discussion

No replies yet. Be first.

Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs

Related coverage