arxiv
PublishedApril 24, 2026 at 4:00 AM
—neutral
SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs
Publisher summary· verbatim
arXiv:2604.20930v1 Announce Type: cross Abstract: Internal Safety Collapse (ISC) is a failure mode in which frontier LLMs, when executing legitimate professional tasks whose correct completion structurally requires harmful content, spontaneously generate that content with safety failure rates exceed
Discussion
No replies yet. Be first.
Originally published on arxiv ↗