Tag

#adversarial-attacks

4 articles tagged #adversarial-attacks

arxivMay 1

Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes

arXiv:2508.05469v3 Announce Type: replace Abstract: We evaluate artificial intelligence (AI) systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. This

#information-theory #adversarial-attacks #machine-learning Read on arxiv →

arxivApr 16

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

arXiv:2509.25843v2 Announce Type: replace Abstract: Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be circumvented by simple linguistic changes. As tense jailbreaking demonstrates that models refusing harmful requests often comply when rephrase

#safety #alignment #jailbreaking Read on arxiv →

arxivApr 10

Corruption-robust Offline Multi-agent Reinforcement Learning From Human Feedback

arXiv:2603.28281v2 Announce Type: replace Abstract: We consider robustness against data corruption in offline multi-agent reinforcement learning from human feedback (MARLHF) under a strong-contamination model: given a dataset $D$ of trajectory-preference tuples (each preference being an $n$-dimensio

#machine-learning #reinforcement-learning #robustness Read on arxiv →

arxivApr 10bearish

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

arXiv:2604.06987v1 Announce Type: cross Abstract: Palmprint recognition is deployed in security-critical applications, including access control and palm-based payment, due to its contactless acquisition and highly discriminative ridge-and-crease textures. However, the robustness of deep palmprint re

#security #adversarial-attacks #computer-vision Read on arxiv →

Tag

#adversarial-attacks

4 articles tagged #adversarial-attacks

arxivMay 1