·

Home
Models
News
Compare
Boards
Pricing
About
Newsletter
Methodology
Contact

Latest

Cursor makes its biggest India push yet ahead of SpaceX acquisition with localized pricing4h◆Photonic reservoir computing with complex networks4h◆XS-VLA: Coupling Coarse-grained Spatial Distillation with Latent Flow Matching for Lightweight Robotic Control4h◆Agentic Permissions Policy Algebra for Taint Confinement in LLM Agents4h◆Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks4h◆The One-Word Census: Answer-Choice Conformity Across 44 Language Models4h◆Creative Integration: A Decidable Criterion of Creativity4h◆BERT-based Models vs. Large Language Models for Low-Resource Named Entity Recognition: A Comparative Study on Marathi4h◆Joint Optimization for Greedy Longest-match Tokenization4h◆Kimi K3: Open Frontier Intelligence4h◆The Few-shot Dilemma: Over-prompting Large Language Models4h◆Speculative Pipeline Decoding: Higher-Accuracy Drafting with Hidden Latency via Pipeline Parallelism4h◆Bayesian Complete-Pooling in Cross-Subject Classification for Motor Imagery Electroencephalogram4h◆StageGuard: Physiologically Constrained Sleep Staging4h◆Soft-Constrained Optimization of Latent Space in Variational Autoencoders4h◆Beyond Error-vs-Discard Characteristic: Toward Stable and Reliable Evaluation for Face Image Quality Assessment4h◆Analyzing the Importance of Blank for CTC-Based Knowledge Distillation4h◆Predicting Channel Closures in the Lightning Network with Machine Learning4h◆Evaluation of Blood Vessel Segmentation Methods on Hard-to-Detect Vascular Structures4h◆MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback4h◆Cursor makes its biggest India push yet ahead of SpaceX acquisition with localized pricing4h◆Photonic reservoir computing with complex networks4h◆XS-VLA: Coupling Coarse-grained Spatial Distillation with Latent Flow Matching for Lightweight Robotic Control4h◆Agentic Permissions Policy Algebra for Taint Confinement in LLM Agents4h◆Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks4h◆The One-Word Census: Answer-Choice Conformity Across 44 Language Models4h◆Creative Integration: A Decidable Criterion of Creativity4h◆BERT-based Models vs. Large Language Models for Low-Resource Named Entity Recognition: A Comparative Study on Marathi4h◆Joint Optimization for Greedy Longest-match Tokenization4h◆Kimi K3: Open Frontier Intelligence4h◆The Few-shot Dilemma: Over-prompting Large Language Models4h◆Speculative Pipeline Decoding: Higher-Accuracy Drafting with Hidden Latency via Pipeline Parallelism4h◆Bayesian Complete-Pooling in Cross-Subject Classification for Motor Imagery Electroencephalogram4h◆StageGuard: Physiologically Constrained Sleep Staging4h◆Soft-Constrained Optimization of Latent Space in Variational Autoencoders4h◆Beyond Error-vs-Discard Characteristic: Toward Stable and Reliable Evaluation for Face Image Quality Assessment4h◆Analyzing the Importance of Blank for CTC-Based Knowledge Distillation4h◆Predicting Channel Closures in the Lightning Network with Machine Learning4h◆Evaluation of Blood Vessel Segmentation Methods on Hard-to-Detect Vascular Structures4h◆MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback4h◆

News/When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

arxiv

PublishedMay 11, 2026 at 4:00 AM

—neutral

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

Source

arxiv.orgfull article ↗

Read on arxiv→

Publisher summary· verbatim

arXiv:2605.07313v1 Announce Type: new Abstract: Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Discussion

Mentioned models

05

01
HippoRAG
02
LiCoMemory
03
Qwen3-8B
04
Qwen3-32B
05
Qwen3-235B

Source

↗

arxiv

Read original ↗All from arxiv →

Tags

04

#evaluation #memory #agents #scalability

No replies yet. Be first.

Mentioned models

05

01
HippoRAG
02
LiCoMemory
03
Qwen3-8B
04
Qwen3-32B
05
Qwen3-235B

Source

↗

arxiv

Read original ↗All from arxiv →

Tags

04

#evaluation #memory #agents #scalability

Related coverage

More from ARXIV

arxivPhotonic reservoir computing with complex networks4h arxivXS-VLA: Coupling Coarse-grained Spatial Distillation with Latent Flow Matching for Lightweight Robotic Control4h arxivAgentic Permissions Policy Algebra for Taint Confinement in LLM Agents4h arxivBeyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks4h

The Bubble Brief

WEEKLY

Read evaluation insights every Tuesday — top movers, new releases, story of the week.

Email address

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗

Home Models News