·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
Frame In, Frame Out: Measuring Framing Bias in LLM-Generated News Summaries1h◆MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation1h◆Training-Trajectory-Aware Token Selection1h◆Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission1h◆MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents1h◆TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale1h◆When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models1h◆UniSD: Towards a Unified Self-Distillation Framework for Large Language Models1h◆SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting1h◆SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation1h◆LLM Agents Already Know When to Call Tools -- Even Without Reasoning1h◆Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation1h◆DocAtlas: Multilingual Document Understanding Across 80+ Languages1h◆VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use1h◆Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation1h◆MixSD: Mixed Contextual Self-Distillation for Knowledge Injection1h◆Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution1h◆Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German1h◆Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning1h◆ImProver: Agent-Based Automated Proof Optimization1h◆Frame In, Frame Out: Measuring Framing Bias in LLM-Generated News Summaries1h◆MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation1h◆Training-Trajectory-Aware Token Selection1h◆Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission1h◆MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents1h◆TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale1h◆When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models1h◆UniSD: Towards a Unified Self-Distillation Framework for Large Language Models1h◆SpecBlock: Block-Iterative Speculative Decoding with Dynamic Tree Drafting1h◆SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation1h◆LLM Agents Already Know When to Call Tools -- Even Without Reasoning1h◆Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation1h◆DocAtlas: Multilingual Document Understanding Across 80+ Languages1h◆VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use1h◆Towards Generalization of Block Attention via Automatic Segmentation and Block Distillation1h◆MixSD: Mixed Contextual Self-Distillation for Knowledge Injection1h◆Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution1h◆Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German1h◆Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning1h◆ImProver: Agent-Based Automated Proof Optimization1h◆
News/I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models
arxiv
PublishedMay 22, 2026 at 4:00 AM

I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2605.21731v1 Announce Type: new Abstract: Deep learning models are increasingly used in scientific prediction tasks where strong benchmark performance is often interpreted as evidence of scientifically meaningful behavior. This interpretation is fragile, as models may exploit shortcut features

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivFrame In, Frame Out: Measuring Framing Bias in LLM-Generated News Summaries1harxivMTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation1harxivTraining-Trajectory-Aware Token Selection1harxivFix the Structural Bottleneck: Context Compression via Explicit Information Transmission1h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews