arxivJul 15bullish

Cost-Governed RAG: Unified Per-Tenant Cost Attribution Across Retrieval and Generation in Multi-Tenant LLM Systems

arXiv:2607.12188v1 Announce Type: new Abstract: Enterprise Retrieval-Augmented Generation (RAG) deployments face a critical governance gap: while LLM generation cost is metered per token, the retrieval layer - vector memory, similarity compute, and embedding API calls - remains an unattributed share

TU1 model #governance #retrieval #cost-attribution Read on arxiv →

arxivJun 26bullish

EvoEmbedding: Evolvable Representations for Long-Context Retrieval and Agentic Memory

arXiv:2606.21649v2 Announce Type: replace Abstract: Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for

EVQWKA4 models · +1 #retrieval #embedding #long-context Read on arxiv →

arxivMay 22

GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval

arXiv:2605.20815v1 Announce Type: cross Abstract: Graph-based Retrieval Augmented Generation (GraphRAG) extends retrieval-augmented generation to support structured reasoning over complex corpora, but its reliability under resource-constrained, privacy-sensitive deployments remains unclear. In healt

MEMIQW4 models · +1 #healthcare #retrieval #generation Read on arxiv →

arxivMay 11bullish

When Does Embedding Magnitude Matter? A Cross-Task Functional-Symmetry Framework

arXiv:2602.09229v3 Announce Type: replace Abstract: Cosine similarity normalizes both sides; dot product normalizes neither. We propose a 2x2 framework that independently controls query-side and document-side normalization, exposing two intermediate variants (QNorm, DNorm) that have not been previou

PR1 model #retrieval #normalization #machine-learning Read on arxiv →

arxivMay 8

SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

arXiv:2605.05726v1 Announce Type: new Abstract: As LLM agents are increasingly deployed with large libraries of reusable skills, selecting the right skill for a user request has become a critical systems challenge. In small libraries, users may invoke skills explicitly by name, but this assumption b

#benchmark #llm #retrieval Read on arxiv →

arxivApr 22bullish

CounterRefine: Answer-Conditioned Counterevidence Retrieval for Inference-Time Knowledge Repair in Factual Question Answering

arXiv:2603.16091v2 Announce Type: replace-cross Abstract: In factual question answering, many errors are not failures of access but failures of commitment: the system retrieves relevant evidence, yet still settles on the wrong answer. We present CounterRefine, a lightweight inference-time repair lay

GPGPBA3 models #question answering #retrieval #inference Read on arxiv →

arxivApr 20bullish

CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents

arXiv:2604.15802v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) systems lose retrieval accuracy when similar documents coexist in the vector database, causing unnecessary information, hallucinations, and factual errors. To alleviate this issue, we propose CHOP, a framework that

LARA2 models #retrieval #language-models #information-retrieval Read on arxiv →