·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Direct Preference Optimization Beyond Chatbots53m◆AI has a water problem. Google thinks it has a fix4h◆Google must let publishers opt out of AI Search features, rules UK5h◆FederatedSkill: Federated Learning for Agentic Skill Evolution9h◆Toward a Modular Architecture for Embedded AI Agent Systems at the Edge9h◆A Graph Foundation Model with Spectral Parsing and Prototype-Guided Spatial Propagation9h◆Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate9h◆Evaluating the Reversal Curse in Model Editing9h◆Fast Unlearning at Scale via Margin Self-Correction9h◆Can Local Learning Match Self-Supervised Backpropagation?9h◆CAPER: Clause-Aligned Process Supervision for Text-to-SQL9h◆An Asymptotic Theory of Chain-of-Thought in In-Context Learning9h◆DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration9h◆Enhanced Renewable Energy Forecasting using Context-Aware Conformal Prediction9h◆Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems9h◆RMPrior: Bridging Propagation Priors and Diffusion Refinement for Efficient Radio Map Construction9h◆Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR9h◆RogueMerge: Robust and Unified Attacks against LLM Model Merging9h◆The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset9h◆MLSkip: Data Skipping for ML Filters via Lightweight Metadata9h◆Direct Preference Optimization Beyond Chatbots53m◆AI has a water problem. Google thinks it has a fix4h◆Google must let publishers opt out of AI Search features, rules UK5h◆FederatedSkill: Federated Learning for Agentic Skill Evolution9h◆Toward a Modular Architecture for Embedded AI Agent Systems at the Edge9h◆A Graph Foundation Model with Spectral Parsing and Prototype-Guided Spatial Propagation9h◆Anomalies in Multivariate Time Series Benchmarks Are Mostly Univariate9h◆Evaluating the Reversal Curse in Model Editing9h◆Fast Unlearning at Scale via Margin Self-Correction9h◆Can Local Learning Match Self-Supervised Backpropagation?9h◆CAPER: Clause-Aligned Process Supervision for Text-to-SQL9h◆An Asymptotic Theory of Chain-of-Thought in In-Context Learning9h◆DeskCraft: Benchmarking Desktop Agents on Professional Workflows and Human-in-the-Loop Collaboration9h◆Enhanced Renewable Energy Forecasting using Context-Aware Conformal Prediction9h◆Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems9h◆RMPrior: Bridging Propagation Priors and Diffusion Refinement for Efficient Radio Map Construction9h◆Learning to Solve, Forgetting to Retain: Correct-Set Turnover in RLVR9h◆RogueMerge: Robust and Unified Attacks against LLM Model Merging9h◆The Road Ahead in Autonomous Driving: The KITScenes Multimodal Dataset9h◆MLSkip: Data Skipping for ML Filters via Lightweight Metadata9h◆
News/A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents
arxiv
PublishedMay 28, 2026 at 4:00 AM
▲bullish

A Benchmark Construction and Evaluation Framework for Specialist Domains: Case Study on Defense-related Documents

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2604.17943v2 Announce Type: replace Abstract: RAG-based question-answering (QA) in specialist domains faces a cold-start problem: lack of evaluative benchmarks and absence of labeled data for post-training. We present DoRA (Domain-oriented RAG Assessment), a novel benchmark construction and ev

Models mentioned
01
  • 01meta-llama logo
    Llama-3.1-8B
    meta-llama/Llama-3.1-8B
    DL 1.3M+1.6%IN $0.10/Mtok
Related
04
  • arxiv12d
    GraphRAG on Consumer Hardware: Benchmarking Local LLMs for Healthcare EHR Schema Retrieval
  • arxiv12d
    DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline
  • arxiv26d
    Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity
  • arxivApr 4
    Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging
Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Mentioned models
01
  • 01
    Llama-3.1-8B
    meta-llama/Llama-3.1-8B
    1.3M dl
Source
↗
arxiv
Read original ↗All from arxiv →
Tags
04
#benchmark#evaluation#specialist-domains#question-answering

No replies yet. Be first.

Mentioned models
01
  • 01
    Llama-3.1-8B
    meta-llama/Llama-3.1-8B
    1.3M dl
Source
↗
arxiv
Read original ↗All from arxiv →
Tags
04
#benchmark#evaluation#specialist-domains#question-answering

Related coverage

More from ARXIV
arxivFederatedSkill: Federated Learning for Agentic Skill Evolution9harxivToward a Modular Architecture for Embedded AI Agent Systems at the Edge9harxivA Graph Foundation Model with Spectral Parsing and Prototype-Guided Spatial Propagation9harxivAnomalies in Multivariate Time Series Benchmarks Are Mostly Univariate9h
The Bubble Brief
WEEKLY

Read benchmark insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews