arxiv
PublishedMay 29, 2026 at 4:00 AM
—neutral
Gram: Assessing sabotage propensities via automated alignment auditing
Publisher summary· verbatim
arXiv:2605.30322v1 Announce Type: cross Abstract: We introduce Gram, an automated alignment auditing framework to assess the propensity of AI agents to engage in sabotage. We evaluate Gemini models across 17 simulated agentic deployment scenarios that incentivize sabotage. We find Gemini models misb
Stay posted· Newsletter
A 5-min weekly brief — top movers, price watch, story of the week.
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning18harxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning18harxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models18harxivTemporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents18hThe Bubble Brief
WEEKLYRead AI insights every Tuesday — top movers, new releases, story of the week.
Originally published on arxiv ↗