·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent1h◆Let us filter AI slop, you cowards2h◆EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios2h◆AI leaders call for tougher protections against AI-aided bioweapons2h◆How Endava is redesigning software delivery around AI agents2h◆Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining3h◆How courts are coping with a flood of AI-generated lawsuits3h◆Amazon develops a warehouse robot that workers can speak to5h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning10h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning10h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models10h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents10h◆Why Muon Outperforms Adam: A Curvature Perspective10h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies10h◆q0: Primitives for Hyper-Epoch Pretraining10h◆Efficient Reasoning on the Edge10h◆Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach10h◆Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems10h◆SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation10h◆AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation10h◆How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent1h◆Let us filter AI slop, you cowards2h◆EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios2h◆AI leaders call for tougher protections against AI-aided bioweapons2h◆How Endava is redesigning software delivery around AI agents2h◆Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining3h◆How courts are coping with a flood of AI-generated lawsuits3h◆Amazon develops a warehouse robot that workers can speak to5h◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning10h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning10h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models10h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents10h◆Why Muon Outperforms Adam: A Curvature Perspective10h◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies10h◆q0: Primitives for Hyper-Epoch Pretraining10h◆Efficient Reasoning on the Edge10h◆Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach10h◆Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems10h◆SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation10h◆AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation10h◆
News/Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications
arxiv
PublishedMay 27, 2026 at 4:00 AM
—neutral

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2605.26133v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become the predominant paradigm in NLP, advancing both research and industry. As model sizes and pretraining data grow, concerns about Pretraining Data Exposure (PDE) increase due to the scale and opacity of training

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning10harxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning10harxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models10harxivTemporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents10h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews