·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning50m◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning50m◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models50m◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents50m◆Why Muon Outperforms Adam: A Curvature Perspective50m◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies50m◆q0: Primitives for Hyper-Epoch Pretraining50m◆Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach50m◆Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems50m◆SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation50m◆AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation50m◆Widening the Gap: Exploiting LLM Quantization via Outlier Injection50m◆Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction50m◆SaliMory: Orchestrating Cognitive Memory for Conversational Agents50m◆Optimizing Explicit Unit-Distance Lower-Bound Certificates50m◆MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning50m◆Demystifying Multi-Agent Debate: The Role of Confidence and Diversity50m◆Physics-Informed Machine Learning for Short-Term Flood Prediction50m◆Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems50m◆POLARIS: Guiding Small Models to Write Long Stories50m◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning50m◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning50m◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models50m◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents50m◆Why Muon Outperforms Adam: A Curvature Perspective50m◆Provably Auditable and Safe LLM Agents from Human-Authored Ontologies50m◆q0: Primitives for Hyper-Epoch Pretraining50m◆Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach50m◆Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems50m◆SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation50m◆AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation50m◆Widening the Gap: Exploiting LLM Quantization via Outlier Injection50m◆Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction50m◆SaliMory: Orchestrating Cognitive Memory for Conversational Agents50m◆Optimizing Explicit Unit-Distance Lower-Bound Certificates50m◆MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning50m◆Demystifying Multi-Agent Debate: The Role of Confidence and Diversity50m◆Physics-Informed Machine Learning for Short-Term Flood Prediction50m◆Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems50m◆POLARIS: Guiding Small Models to Write Long Stories50m◆
News/STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator
arxiv
PublishedApril 29, 2026 at 4:00 AM
▲bullish

STELLAR-E: a Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2604.24544v1 Announce Type: new Abstract: The increasing reliance on Large Language Models (LLMs) across diverse sectors highlights the need for robust domain-specific and language-specific evaluation datasets; however, the collection of such datasets is challenging due to privacy concerns, re

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Mentioned models
02
  • 01
    Large Language Models (LLMs)
  • 02
    TGRT Self-Instruct
Source
↗
arxiv
Read original ↗All from arxiv →
Tags
04
#benchmark#evaluation#language-models#synthetic-data

No replies yet. Be first.

Mentioned models
02
  • 01
    Large Language Models (LLMs)
  • 02
    TGRT Self-Instruct
Source
↗
arxiv
Read original ↗All from arxiv →
Tags
04
#benchmark#evaluation#language-models#synthetic-data

Related coverage

More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning50marxivOptical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning50marxivDynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models50marxivTemporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents50m
The Bubble Brief
WEEKLY

Read benchmark insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews