Tag

#vulnerability

9 articles tagged #vulnerability

arxivJun 12

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

arXiv:2606.12737v1 Announce Type: cross Abstract: Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses

#security #vulnerability #ai-safety Read on arxiv →

arxivJun 12bearish

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

arXiv:2606.13385v1 Announce Type: cross Abstract: Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks

#security #benchmark #vulnerability Read on arxiv →

arxivJun 12bearish

It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO

arXiv:2606.10931v2 Announce Type: replace Abstract: Warning: This paper contains several toxic and offensive statements. Modern large language models (LLMs) are typically aligned through large-scale post-training to ensure fair and reliable behavior. In this work, we investigate how easily such guar

#bias #safety #language-models Read on arxiv →

arxivJun 5

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

arXiv:2606.04778v1 Announce Type: new Abstract: Safety-aligned Large Language Models (LLMs) remain vulnerable to interventions during inference that redirect generation toward harmful outputs. Recent work attributes this to shallow safety, where alignment concentrates in the first few output tokens.

#safety #large-language-models #vulnerability Read on arxiv →

arxivMay 16bearish

Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations

arXiv:2512.11484v1 Announce Type: cross Abstract: This paper reveals and exploits a critical security vulnerability: the electromagnetic (EM) side channel of capacitive touchscreens leaks sufficient information to recover fine-grained, continuous handwriting trajectories. We present Touchscreen Elec

#security #vulnerability #attack Read on arxiv →

arxivApr 23bullish

White-Basilisk: A Hybrid Model for Code Vulnerability Detection

arXiv:2507.08540v5 Announce Type: replace-cross Abstract: The proliferation of software vulnerabilities presents a significant challenge to cybersecurity, necessitating more effective detection methodologies. We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates

WHLA2 models #cybersecurity #vulnerability #ai Read on arxiv →

arxivApr 21bearish

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

arXiv:2604.10577v2 Announce Type: replace-cross Abstract: Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to automate harmful actions programmatically. Existing safety evaluations largely target explicit thre

CL1 model #safety #security #benchmark Read on arxiv →

arxivApr 13

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

arXiv:2604.09222v1 Announce Type: cross Abstract: Audio large language models (ALLMs) enable rich speech-text interaction, but they also introduce jailbreak vulnerabilities in the audio modality. Existing audio jailbreak methods mainly optimize jailbreak success while overlooking utility preservatio

#audio #jailbreak #vulnerability Read on arxiv →

thevergeApr 7bullish

A new Anthropic model found security problems ‘in every major operating system and web browser’

Anthropic is debuting a new AI model as part of a cybersecurity partnership with Nvidia, Google, Amazon Web Services, Apple, Microsoft, and other companies. Project Glasswing, as it's called, is billed as a way for large companies, and potentially even the government, to flag vulnerabilities in thei

CL1 model #cybersecurity #partnership #vulnerability Read on theverge →

Tag

#vulnerability

9 articles tagged #vulnerability

arxivJun 12