Tag

#security

41 articles tagged #security

arxiv5d agobullish

OpenEvoShield: Dual Non-Stationary Continual Defense for Open-World Multi-Agent System Attacks

arXiv:2607.19351v1 Announce Type: new Abstract: LLM-based multi-agent systems (LLM-MAS) are increasingly deployed in safety-critical applications, where adversaries inject malicious instructions through inter-agent communication to propagate harmful behaviors. Unlike static threats, these attacks ar

M1M2M34 models · +1 #safety #security #continual-learning Read on arxiv →

arxiv5d ago

ChainWatch: A Kill Chain-Aligned Sequential Detection Framework for Multi-Step Attacks in MCP-Based AI Agent Systems

arXiv:2607.19432v1 Announce Type: cross Abstract: The Model Context Protocol (MCP) is an open-source standard that allows AI agents to connect to external tools, databases, and services. While this connectivity enables powerful agent capabilities, it also introduces multi-step attacks that existing

HI1 model #security #ai #detection Read on arxiv →

techcrunch5d ago

Arcee, a US open source AI lab, says Chinese models are not inherently dangerous

As Chinese AI models grow in capability and popularity among U.S. companies, the arguing over what should be done about them has reached a fever pitch.

KIQW2 models #open-source #security #regulation Read on techcrunch →

arxivJul 10

Idiobionics: The Unification of Privacy and Intelligent Robotic Prostheses

arXiv:2607.07775v1 Announce Type: new Abstract: The human body is at the center of a growing family of technologies designed to tightly and persistently couple biological and digital systems. Robotic prostheses are a representative example of this tight coupling. Also referred to as bionic limbs, ro

#robotics #prosthetics #security Read on arxiv →

arxivJul 2

NeuroFilter: Activation-Based Guardrails for Privacy-Conscious LLM Agents

arXiv:2601.14660v2 Announce Type: replace-cross Abstract: Agentic Large Language Models (LLMs) are models able to reason, plan, and execute tools over unstructured data. These abilities are enabling transformative applications in domains spanning from personal assistant, financial, and legal domains

AG1 model #privacy #security #language-models Read on arxiv →

arxivJun 27

MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG

arXiv:2606.26793v1 Announce Type: cross Abstract: Multimodal agentic retrieval-augmented generation (RAG) systems expand the attack surface beyond prompt injection to include text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-teaming approac

#security #adversarial-attacks #multimodal Read on arxiv →

arxivJun 26

DroidBreaker: Practical and Functional Problem-Space Attacks on Machine-Learning Android Malware Detectors

arXiv:2606.26707v1 Announce Type: cross Abstract: Adversarial APKs are Android applications modified in the problem space to evade machine-learning malware detectors. In this work, we first show that, despite claims, existing problem-space attacks remain largely impractical. Most techniques leverage

#security #malware #adversarial-attacks Read on arxiv →

techcrunchJun 19

The US banned Anthropic’s Fable 5 release, but the numbers don’t seem to care

Just as last week was ending, the US government forced Anthropic to pull its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5’s guardrails. Cybersecurity researchers have since signed an open letter calling th

FAMY2 models #regulation #security #acquisition Read on techcrunch →

arxivJun 18

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

arXiv:2606.18996v1 Announce Type: cross Abstract: Agents are increasingly deployed in document-intensive workflows where sensitive private information is not an edge case but a routine input, e.g., an agent booking a flight needs passport numbers. In such settings, the agent must use private informa

#privacy #security #cryptography Read on arxiv →

arxivJun 18

AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework

arXiv:2606.18532v1 Announce Type: cross Abstract: AI systems are increasingly evaluated in bounded environments that combine isolation, simulation, instrumentation, supervision, and evidence capture. For physical AI, AIoT, and cyber-physical systems, this shift is not a matter of terminology: the sy

#safety #security #evaluation Read on arxiv →

arxivJun 17

Combating Data Laundering in LLM Training

arXiv:2604.01904v3 Announce Type: replace-cross Abstract: Post-hoc unauthorized-training data detection for large language models (LLMs) typically assumes a query-with-originals regime: rights holders query a target LLM with raw proprietary data and assess whether the model assigns them stronger mem

MEPYFA3 models #data-laundering #detection #security Read on arxiv →

arxivJun 15bearish

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluat

MU1 model #security #captcha #adversarial-attacks Read on arxiv →

arxivJun 12

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

arXiv:2606.12737v1 Announce Type: cross Abstract: Large Language Models (LLMs) are rapidly evolving into agentic systems that interact with external tools and environments, introducing new security risks such as indirect prompt injection attacks through untrusted external sources. Existing defenses

#security #vulnerability #ai-safety Read on arxiv →

arxivJun 12

Categorical Robustness Assessment for Machine Learning based Network Intrusion Detection Systems

arXiv:2606.12075v1 Announce Type: cross Abstract: Network Intrusion Detection Systems (NIDS) heavily utlize Machine Learning (ML) but ML models can be manipulated via adversarial attacks. These attacks add carefully crafted perturbations to network traffic data that leads to misclassifications. Whil

1DLORA3 models #adversarial-attacks #network-intrusion-detection #machine-learning Read on arxiv →

arxivJun 12bearish

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

arXiv:2606.13385v1 Announce Type: cross Abstract: Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks

#security #benchmark #vulnerability Read on arxiv →

arxivJun 11

On the Study of Biometric Spoofing Detection using Deep Learning

arXiv:2606.11505v1 Announce Type: cross Abstract: Biometric systems are increasingly deployed in security applications; however, they remain vulnerable to spoofing attacks, in which attackers exploit counterfeit biometric data to gain unauthorized access. This research evaluates the effectiveness of

MODEIN4 models · +1 #security #facial recognition #machine learning Read on arxiv →

arxivJun 10

Advancing the State-of-the-Art in Empirical Privacy Auditing

arXiv:2606.10481v1 Announce Type: cross Abstract: Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership inference (M

#privacy #language-models #auditing Read on arxiv →

arxivJun 1

LLM Anonymization Against Agentic Re-Identificatio

arXiv:2605.30848v1 Announce Type: cross Abstract: Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become cross-referenceable evidence for re-identification, yet those same details also carry downstream analytic value of the text. Existing defense

AU1 model #anonymization #privacy #security Read on arxiv →

arxivMay 29bullish

WaterSearch: A Quality-Aware Search-based Watermarking Framework for Large Language Models

arXiv:2512.00837v2 Announce Type: replace Abstract: Watermarking acts as a critical safeguard in text generated by Large Language Models (LLMs). By embedding identifiable signals into model outputs, watermarking enables reliable attribution and enhances the security of machine-generated content. Exi

LA1 model #watermarking #language-models #security Read on arxiv →

arxivMay 29bullish

AliMark: Enhancing Robustness of Sentence-Level Watermarking Against Text Paraphrasing

arXiv:2605.29434v1 Announce Type: cross Abstract: Existing sentence-level watermarking methods enhance robustness to paraphrasing by anchoring watermarks in sentence semantics. However, their prefix-based designs remain vulnerable to structural perturbations, such as sentence splitting and merging,

DIOP2 models #watermarking #paraphrasing #robustness Read on arxiv →

arxivMay 22bullish

Heartbeat-Bound Hierarchical Credentials: Cryptographic Revocation for AI Agent Swarms

arXiv:2605.20704v1 Announce Type: cross Abstract: Autonomous AI agents that spawn sub-agent swarms create a safety gap: existing credential revocation mechanisms, OAuth~2.0 introspection, OCSP, and W3C Status Lists, require network connectivity to a central authority, leaving ``zombie agents'' execu

GP1 model #cryptography #security #multiagent Read on arxiv →

openaiMay 18bullish

OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments

OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows.

CO1 model #partnership #enterprise #security Read on openai →

arxivMay 16

Trapping Attacker in Dilemma: Examining Internal Correlations and External Influences of Trigger for Defending GNN Backdoors

arXiv:2605.08278v2 Announce Type: replace-cross Abstract: GNNs have become a standard tool for learning on relational data, yet they remain highly vulnerable to backdoor attacks. Prior defenses often depend on inspecting specific subgraph patterns or node features, and thus can be circumvented by ad

#graph-neural-networks #backdoor-attacks #security Read on arxiv →

arxivMay 16bearish

Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations

arXiv:2512.11484v1 Announce Type: cross Abstract: This paper reveals and exploits a critical security vulnerability: the electromagnetic (EM) side channel of capacitive touchscreens leaks sufficient information to recover fine-grained, continuous handwriting trajectories. We present Touchscreen Elec

#security #vulnerability #attack Read on arxiv →

arxivMay 11

Searching for Privacy Risks in LLM Agents via Simulation

arXiv:2508.10880v3 Announce Type: replace-cross Abstract: The widespread deployment of LLM-based agents is likely to introduce a critical privacy threat: malicious agents that proactively engage others in multi-turn interactions to extract sensitive information. However, the evolving nature of such

LL1 model #privacy #security #language-models Read on arxiv →

arxivMay 8bullish

Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs

arXiv:2605.06305v1 Announce Type: new Abstract: Automated privacy audits of web and mobile applications often analyse outbound HTTP traffic to detect Personally Identifiable Information (PII) leakage. However, existing learning-based detectors typically depend on scarce, manually labelled traffic an

LA1 model #privacy #security #annotation Read on arxiv →

arxivMay 8

SMI: Statistical Membership Inference for Reliable Unlearned Model Auditing

arXiv:2602.01150v2 Announce Type: replace-cross Abstract: Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten specified training data. Membership Inference Attacks

#machine-learning #unlearning #auditing Read on arxiv →

arxivMay 6

E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems

arXiv:2605.00955v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) equips large language models (LLMs) with external evidence by retrieving documents at inference time, but it also turns the retrieval corpusinto a sensitive asset. Under a black-box setting, an adversary given a c

RE1 model #security #language-models #inference Read on arxiv →

arxivMay 5

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

arXiv:2605.00314v1 Announce Type: cross Abstract: An agent skill is a configuration package that equips an LLM-driven agent with a concrete capability, such as reading email, executing shell commands, or signing blockchain transactions. Each skill is a hybrid artifact-a structured half declares exec

LL1 model #security #audit #llm Read on arxiv →

arxivMay 1bullish

From surveillance to signalling: escalation channels as environmental controls for agentic AI

arXiv:2510.05192v2 Announce Type: replace-cross Abstract: When AI agents operating with access to sensitive information encounter a conflict between completing an assigned task and following rules or ethical constraints, they can resort to unsanctioned behaviour. Existing inference time safety work

LL1 model #safety #security #ai ethics Read on arxiv →

arxivApr 24

Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution

arXiv:2508.15840v5 Announce Type: replace-cross Abstract: When using a public communication channel--whether formal or informal, such as commenting or posting on social media--end users have no expectation of privacy: they compose a message and broadcast it for the world to see. Even if an end user

#security #steganography #stylometry Read on arxiv →

arxivApr 23

Atomic Decision Boundaries: A Structural Requirement for Guaranteeing Execution-Time Admissibility in Autonomous Systems

arXiv:2604.17511v2 Announce Type: replace-cross Abstract: Autonomous systems increasingly execute actions that directly modify shared state, creating an urgent need for precise control over which transitions are permitted to occur. Existing governance mechanisms evaluate policies prior to execution

#governance #autonomous-systems #security Read on arxiv →

arxivApr 22

Owner-Harm: A Missing Threat Model for AI Agent Safety

arXiv:2604.18658v1 Announce Type: cross Abstract: Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat category: agents harming their own deployers. Real-wor

AGAGLL3 models #safety #security #benchmark Read on arxiv →

arxivApr 22

Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection

arXiv:2604.19526v1 Announce Type: cross Abstract: Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious payload while preserving its behavior. These transformations make it difficult for traditional and m

LA1 model #security #obfuscation #machine-learning Read on arxiv →

arxivApr 21bearish

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

arXiv:2604.10577v2 Announce Type: replace-cross Abstract: Computer-use agents (CUAs) can now autonomously complete complex tasks in real digital environments, but when misled, they can also be used to automate harmful actions programmatically. Existing safety evaluations largely target explicit thre

CL1 model #safety #security #benchmark Read on arxiv →

arxivApr 17

Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs

arXiv:2509.05367v4 Announce Type: replace-cross Abstract: Large Language Model safety alignment predominantly operates on a binary assumption that requests are either safe or unsafe. This classification proves insufficient when models encounter ethical dilemmas, where the capacity to reason through

#safety #security #cryptography Read on arxiv →

arxivApr 16

Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference

arXiv:2604.12168v1 Announce Type: cross Abstract: The applications of Generative Artificial Intelligence (GenAI) and their intersections with data-driven fields, such as healthcare, finance, transportation, and information security, have led to significant improvements in service efficiency and low

DE1 model #security #cryptography #homomorphic-encryption Read on arxiv →

arxivApr 11

Are GUI Agents Focused Enough? Automated Distraction via Semantic-level UI Element Injection

arXiv:2604.07831v1 Announce Type: cross Abstract: Existing red-teaming studies on GUI agents have important limitations. Adversarial perturbations typically require white-box access, which is unavailable for commercial systems, while prompt injection is increasingly mitigated by stronger safety alig

#security #adversarial #computer-vision Read on arxiv →

arxivApr 10bearish

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

arXiv:2604.06987v1 Announce Type: cross Abstract: Palmprint recognition is deployed in security-critical applications, including access control and palm-based payment, due to its contactless acquisition and highly discriminative ridge-and-crease textures. However, the robustness of deep palmprint re

#security #adversarial-attacks #computer-vision Read on arxiv →

arxivApr 6

Learning the Signature of Memorization in Autoregressive Language Models

arXiv:2604.03199v1 Announce Type: cross Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack,

MARWRE3 models #membership inference #language models #transfer learning Read on arxiv →

thevergeApr 2bearish

PSA: Anyone with a link can view your Granola notes by default

If you use the AI-powered note-taking app Granola, you might want to double-check your privacy settings. Though Granola says your notes are "private by default," it makes them viewable to anyone with a link, and also uses them for internal AI training unless you opt out. Granola describes itself as

GR1 model #privacy #security #ai-training Read on theverge →

Tag