·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
How Endava is redesigning software delivery around AI agents-110m◆Amazon develops a warehouse robot workers can speak to39m◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning6h◆MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models6h◆Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories6h◆Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems6h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning6h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models6h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents6h◆Why Muon Outperforms Adam: A Curvature Perspective6h◆What Type of Inference is Active Inference?6h◆AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?6h◆SaliMory: Orchestrating Cognitive Memory for Conversational Agents6h◆Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models6h◆HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite6h◆Physics-Informed Machine Learning for Short-Term Flood Prediction6h◆ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models6h◆Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning6h◆MimeLens: Position-Agnostic Content-Type Detection for Binary Fragments6h◆Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions6h◆How Endava is redesigning software delivery around AI agents-110m◆Amazon develops a warehouse robot workers can speak to39m◆SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning6h◆MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models6h◆Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories6h◆Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems6h◆Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning6h◆Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models6h◆Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents6h◆Why Muon Outperforms Adam: A Curvature Perspective6h◆What Type of Inference is Active Inference?6h◆AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?6h◆SaliMory: Orchestrating Cognitive Memory for Conversational Agents6h◆Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models6h◆HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite6h◆Physics-Informed Machine Learning for Short-Term Flood Prediction6h◆ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models6h◆Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning6h◆MimeLens: Position-Agnostic Content-Type Detection for Binary Fragments6h◆Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions6h◆
News/SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
arxiv
PublishedMay 19, 2026 at 4:00 AM

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2605.08738v2 Announce Type: replace-cross Abstract: Structured pruning and knowledge distillation (KD) are typical techniques for compressing large language models, but it remains unclear how they should be applied at pretraining scale, especially to recent mixture-of-experts (MoE) models. In

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivSFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning6harxivMIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models6harxivInference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories6harxivBeyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems6h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews