·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale1h◆Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension2h◆Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents2h◆LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling2h◆Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study2h◆The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok2h◆RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery2h◆From Benchmarks to Skills: Low-Rank Factors for LLM Evaluation2h◆Unraveling Syntax: Language Modeling and the Substructure of Grammars2h◆BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection2h◆Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision2h◆Select to Think: Unlocking SLM Potential with Local Sufficiency2h◆BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon2h◆Can Factual Opinions Be Edited (Manipulated) in Large Language Models?2h◆Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention2h◆GENEB: Why Genomic Models Are Hard to Compare2h◆It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO2h◆GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs2h◆A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs2h◆Causal Inference with Generative Artificial Intelligence: Application to Texts as Treatments2h◆Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale1h◆Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension2h◆Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents2h◆LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling2h◆Direct Preference Optimization for Chatbot Fine-Tuning: An Empirical Study2h◆The Tone of Awareness: Topic, Sentiment, and Toxicity Maps During Mental Health Month on TikTok2h◆RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery2h◆From Benchmarks to Skills: Low-Rank Factors for LLM Evaluation2h◆Unraveling Syntax: Language Modeling and the Substructure of Grammars2h◆BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection2h◆Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision2h◆Select to Think: Unlocking SLM Potential with Local Sufficiency2h◆BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon2h◆Can Factual Opinions Be Edited (Manipulated) in Large Language Models?2h◆Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention2h◆GENEB: Why Genomic Models Are Hard to Compare2h◆It Takes One to Bias Them All: Breaking Bad with One-Shot GRPO2h◆GraspLLM: Towards Zero-Shot Generalization on Text-Attributed Graphs with LLMs2h◆A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs2h◆Causal Inference with Generative Artificial Intelligence: Application to Texts as Treatments2h◆
News/How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation
arxiv
PublishedJune 12, 2026 at 4:00 AM
—neutral

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2606.12789v1 Announce Type: new Abstract: Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present HieraRAG, a hierarchica

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivMagnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension2harxivGetting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents2harxivLoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling2harxivDirect Preference Optimization for Chatbot Fine-Tuning: An Empirical Study2h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews