arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
Why Do Language Model Agents Whistleblow?
Publisher summary· verbatim
arXiv:2511.17085v3 Announce Type: replace-cross Abstract: The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that contradict the interests or explicit instructions o
Discussion
No replies yet. Be first.
Originally published on arxiv ↗