arxiv
PublishedApril 27, 2026 at 4:00 AM
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
Publisher summary· verbatim
arXiv:2505.13527v4 Announce Type: replace-cross Abstract: Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies bet
Discussion
No replies yet. Be first.
Related coverage
More from ARXIV
arxivFrom Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables11harxivConsequentialist Objectives and Catastrophe11harxivEgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms11harxivReCast: Recasting Learning Signals for Reinforcement Learning in Generative Recommendation11hOriginally published on arxiv ↗